retiming and re-synthesis
DESCRIPTION
Retiming and Re-synthesis. Outline: Retiming Retiming and Resynthesis (RnR) Resynthesis of Pipelines. Optimizing Sequential Circuits by Retiming Netlist of Gates. Netlist of gates and registers: Various Goals: Reduce clock cycle time Reduce area Reduce number of latches. Inputs. - PowerPoint PPT PresentationTRANSCRIPT
1
Retiming and Re-synthesisRetiming and Re-synthesis
Outline:Outline:• RetimingRetiming• Retiming and Resynthesis (RnR) Retiming and Resynthesis (RnR) • Resynthesis of PipelinesResynthesis of Pipelines
2
Optimizing Sequential Optimizing Sequential Circuits by Retiming Netlist of Circuits by Retiming Netlist of
GatesGatesNetlist of gates and registers:Netlist of gates and registers:
Various Goals:Various Goals:– Reduce clock cycle timeReduce clock cycle time– Reduce areaReduce area
• Reduce number of latchesReduce number of latches
InputsInputs
OutputsOutputs
3
RetimingRetimingProblemProblem
– Pure combinational optimization can be Pure combinational optimization can be myopicmyopic since since relations across register boundaries are disregardedrelations across register boundaries are disregarded
SolutionsSolutions– RetimingRetiming: Move register(s) so that : Move register(s) so that
• clock cycle decreases, or number of registers decreases andclock cycle decreases, or number of registers decreases and• input-output behavior is preservedinput-output behavior is preserved
– RnRRnR: Combine retiming with combinational : Combine retiming with combinational optimization techniques optimization techniques • Move latches out of the way temporarilyMove latches out of the way temporarily• optimize larger blocks of combinationaloptimize larger blocks of combinational
4
Circuit RepresetationCircuit Represetation[Leiserson, Rose and Saxe (1983)][Leiserson, Rose and Saxe (1983)]Circuit representation: Circuit representation: G(V,E,d,w)G(V,E,d,w)
– V V set of gates set of gates– E E set of wires set of wires– d(v) = delay of gate/vertex v, d(v) = delay of gate/vertex v, (d(v)(d(v)0)0)– w(e) = number of registers on edge e, w(e) = number of registers on edge e,
(w(e)(w(e)0)0)
5
Circuit RepresentationCircuit RepresentationExampleExample:: Correlator Correlator
CircuitCircuit
(x, y) = 1 if x=y(x, y) = 1 if x=y0 otherwise0 otherwise
Operation delayOperation delay
33
++ 7 7Every cycle in Graph has at least one Every cycle in Graph has at least one register i.e. no combinational loops.register i.e. no combinational loops.
00
33 33
00
0000
0022
Graph (Directed)Graph (Directed)
77
aa bb
++
HostHost
6
PreliminariesPreliminariesFor a path For a path p p ::
Clock cycleClock cycle
1
0
0
)()(
)()(k
ii
k
ii
ewpw
vdpd endpoints) (includes
: ( ) 0max { ( )}p w p
c d p
For correlator c = 13For correlator c = 13
Path with Path with w(p)=0w(p)=000
33 33
00
0000
0022
77
0 11
0 1 1
ke ee
k kv v v v
7
• Movement of registers from input to output of a Movement of registers from input to output of a gate or vice versagate or vice versa
• Does not affect gate functionalitiesDoes not affect gate functionalities• A mathematical definition: A mathematical definition: retardationretardation
– rr: V : V Z, an integer Z, an integer vertexvertex labeling labeling– wwrr(e) = w(e) + r(v) - r(u)(e) = w(e) + r(v) - r(u) for edge e = (u,v) for edge e = (u,v)
Basic OperationBasic Operation
Retime by 1
Retime by -1
8
Thus in the example, r(Thus in the example, r(uu) = -1, r() = -1, r(vv) = -1 ) = -1 results inresults in
• For a path p: sFor a path p: st, wt, wrr(p) = w(p) + r(t) - r(s)(p) = w(p) + r(t) - r(s)• Retardation Retardation
– r: Vr: VZ, an integer vertex labelingZ, an integer vertex labeling– wwrr(e) =w(e) + r(v) - r(u) for edge e= (u,v)(e) =w(e) + r(v) - r(u) for edge e= (u,v)– A retiming r is A retiming r is legallegal if w if wrr(e) (e) 0, 0, eeEE
Basic OperationBasic Operation
vvuu00
33 33
00
0000
0022
77
vvuu00
33 33
00
1111
0011
77
9
Retiming for minimum clock cycleRetiming for minimum clock cycleProblem StatementProblem Statement: (minimum cycle time): (minimum cycle time)
Given G (V, E, d, w), find a legal retiming Given G (V, E, d, w), find a legal retiming rr so so that that
is minimizedis minimizedRetiming: Retiming: 2 important matrices2 important matrices
• Register weight matrixRegister weight matrix
• Delay matrixDelay matrix
: ( ) 0max { ( )}rp w p
c d p
( , ) min{ ( ) : }p
pW u v w p u v
( , ) max{ ( ) : , ( ) ( , )}p
pD u v d p u v w p w u v
10
Retiming for minimum clock Retiming for minimum clock cyclecycle
WWV0 V1 V2 V3V0 V1 V2 V3
V0V0V1V1V2V2V3V3
0 2 2 20 2 2 20 0 0 00 0 0 00 2 0 00 2 0 00 2 2 00 2 2 0
c c p, if d(p) p, if d(p) then w(p) then w(p) 1 1
DDV0 V1 V2 V3V0 V1 V2 V3
V0V0V1V1V2V2V3V3
0 3 6 0 3 6 13131313 3 6 3 6
131310 1310 13 3 3
10107 7 10 1310 13 7 7
V2V2V1V1
v0v0 00
33 33
00
0000
0022
77W = register W = register pathpath
weight matrixweight matrix (minimum # latches(minimum # latches on all paths betweenon all paths between u and v)u and v)D = D = pathpath delay matrix delay matrix (maximum delay on(maximum delay on all paths betweenall paths between u and v)u and v)
11
Conditions for RetimingConditions for RetimingAssume that we are asked to check if a retiming exists for a Assume that we are asked to check if a retiming exists for a
clock cycle clock cycle Legal retiming: wLegal retiming: wrr(e) (e) 0 for all e. Hence 0 for all e. Hence
wwrr(e) = w(e) = r(v) - r(u) (e) = w(e) = r(v) - r(u) 0 or 0 orr (u) - r (v) r (u) - r (v) w (e) w (e)
For all paths p: u For all paths p: u v such that d(p) v such that d(p) , we require w, we require wrr(p) (p) 1 1– ThusThus
1
0
1
10
0
1 ( ) ( )
[ (
( ) ( ) (
) ( ) ( )]
( ) ( ))
( )
k
r r ii
k
i i ii
k
w p w e
w e r v
w p r
r v
w p r v vr ur
v
Or take the least w(p) Or take the least w(p) (tightest constraint)(tightest constraint) r(u)-r(v) r(u)-r(v) W(u,v)-1 W(u,v)-1NoteNote: this is independent of the path from u to v, so we just need to apply : this is independent of the path from u to v, so we just need to apply it to u, v such that D(u,v) it to u, v such that D(u,v)
12
• All constraints in difference-of-2-variable form All constraints in difference-of-2-variable form • Related to shortest path problemRelated to shortest path problem
Solving the constraintsSolving the constraints
Correlator: Correlator: = 7 = 7
Legal: r(u)-r(v)Legal: r(u)-r(v)w(e)w(e)
0)()(0)()(0)()(0)()(2)()(
03
32
31
21
10
vrvrvrvrvrvrvrvrvrvr
1)()(1)()(
1)()(1)()(
1)()(1)()(1)()(
1)()(
23
13
32
12
02
31
01
30
vrvrvrvrvrvrvrvrvrvrvrvrvrvrvrvr
D>7:D>7:r(u)-r(v)r(u)-r(v)W(u,v)-1W(u,v)-1
V2V2v1v1
v0v0 00
33 33
00
0000
0022
77
WWV0 V1 V2 V3V0 V1 V2 V3
V0V0V1V1V2V2V3V3
0 2 2 20 2 2 20 0 0 00 0 0 00 2 0 00 2 0 00 2 2 00 2 2 0
DDV0 V1 V2 V3V0 V1 V2 V3
V0V0V1V1V2V2V3V3
0 3 6 0 3 6 13 13 1313 3 6 3 6
131310 1310 13 3 3
10107 7 10 1310 13 7 7
13
• Do shortest path on constraint graph: (O(|V|Do shortest path on constraint graph: (O(|V|3 3 )).)).• A solution exists if and only if there exists A solution exists if and only if there exists no negative no negative
weighted cycleweighted cycle..
Solving the constraintsSolving the constraints
Legal: r(u)-r(v)Legal: r(u)-r(v)w(e)w(e)
0)()(0)()(0)()(0)()(2)()(
03
32
31
21
10
vrvrvrvrvrvrvrvrvrvr
1)()(1)()(
1)()(1)()(
1)()(1)()(1)()(
1)()(
23
13
32
12
02
31
01
30
vrvrvrvrvrvrvrvrvrvrvrvrvrvrvrvr
D>7:D>7:r(u)-r(v)r(u)-r(v)W(u,v)-1W(u,v)-1
A solution is r(vA solution is r(v00) = r(v) = r(v33) = 0, r(v) = 0, r(v11) = r(v) = r(v22) = -1) = -1
r(1)r(1)r(0)r(0)
r(3)r(3)r(2)r(2)
00
11 11
11
11
11
-1-1
-1-1
-1-1
0,-10,-1
0,-10,-1
00
00-1-1
22
Constraint graphConstraint graph
14
RetimingRetimingTo find theTo find the minimum cycle time, do a binary search minimum cycle time, do a binary search among the entries of the D matrix among the entries of the D matrix (0((0(VV33 loglogVV))))
RetimeRetimeRetimed correlator:Retimed correlator:
Clock cycleClock cycle = 3+3+7=13= 3+3+7=13 Clock cycle = 7Clock cycle = 7
V2V2v1v1
v0v0 00
33 33
00
0000
0022
77
aa bb
++
HostHost
aa bb
++
HostHost
WWV0 V1 V2 V3V0 V1 V2 V3
V0V0V1V1V2V2V3V3
0 2 2 20 2 2 20 0 0 00 0 0 00 2 0 00 2 0 00 2 2 00 2 2 0
DDV0 V1 V2 V3V0 V1 V2 V3
V0V0V1V1V2V2V3V3
0 3 6 0 3 6 13131313 3 6 3 6
131310 1310 13 3 3
10107 7 10 1310 13 7 7
15
1. Relaxation1. Relaxation based:based: – Repeatedly find critical path; Repeatedly find critical path; – retime vertex at end of path by +1 retime vertex at end of path by +1 (O((O(VVEEloglogVV))))
2. Also, Mixed Integer Linear Program formulation2. Also, Mixed Integer Linear Program formulation
Retiming: 2 more algorithmsRetiming: 2 more algorithms
++11
uuCritical pathCritical path
vv
16
Retiming for minimum areaRetiming for minimum area(minimum # latches)(minimum # latches)
Goal:Goal: minimize number of registers usedminimize number of registers used
:
:
min ( )
( ( ) ( ) ( ))
( ) ( ( ) ( ))
( ( ) ( ))
( )(# ( ) # ( )
( )
r re E
e u v
e E e u v
u v
v V
Vv V
N w e
w e r v r u
w e r v r u
N r v r u
N r v fanin v fanout v
N a r v
where where aavv is a constant. is a constant.
17
Minimize: Minimize:
Minimum registers - Minimum registers - formulationformulation
( )vv V
a r vSubject to:Subject to: wwrr(e) =w(e) + r(v) - r(u) (e) =w(e) + r(v) - r(u) 0 0
• Reducible to a Reducible to a flow problemflow problem
18
Retiming and resynthesis: Retiming and resynthesis: motivationmotivation
Goal:Goal: incorporate combinational optimization into sequential optimization incorporate combinational optimization into sequential optimization• Naïve approachNaïve approach: carve out combinational regions, do optimization on each region. : carve out combinational regions, do optimization on each region.
Only local gains made.Only local gains made.• Can we do any better?Can we do any better?
Sentovich, Malik, Brayton andSangiovanni-Vincentelli (‘89)Sentovich, Malik, Brayton andSangiovanni-Vincentelli (‘89)3 step approach3 step approach
1.1. Move registers to Move registers to boundaryboundary of circuit of circuit2.2. OptimizeOptimize network network3.3. Move registers Move registers backback in an optimal way in an optimal way
RnR: a new approachRnR: a new approach
19
RnR: circuit representationRnR: circuit representation
Circuit representation: Circuit representation: communication graphcommunication graph– internal/peripheral edgesinternal/peripheral edges– edge-weight = register countedge-weight = register count
cc
aa bb11 11
11 22
00
ii jjinternalinternalperipheralperipheral
20
Extended RetimingExtended Retiming• Move register to the Move register to the peripheryperiphery• NegativeNegative edge-weights permitted edge-weights permitted
• A negative latch has the interpretation that A negative latch has the interpretation that it it advancesadvances its output by 1 clock cycle its output by 1 clock cycle instead of delaying itinstead of delaying it
-1-1
11
negativenegativelatchlatch
21
Peripheral RetimingPeripheral RetimingA retiming is called a A retiming is called a peripheralperipheral retiming if it results retiming if it results
in all internal edges having in all internal edges having zerozero weight weightPeripheral edges can have Peripheral edges can have negativenegative weight weight
11
All internal edge All internal edge weights are 0weights are 0
11
22 nn
22nn
kk kk
peripheralperipheralweightsweights
22
Peripheral RetimingPeripheral RetimingA circuit that undergoes peripheral retiming followed by A circuit that undergoes peripheral retiming followed by
a a legallegal retiming, i.e. one that results in all weights retiming, i.e. one that results in all weights 0 is “0 is “functionally equivalentfunctionally equivalent”” to the original circuit to the original circuit
Functional equivalenceFunctional equivalence: equivalence of finite state : equivalence of finite state automata: automata: But we have to be careful about the But we have to be careful about the initialinitial conditions and conditions and
initializinginitializing sequences. The resulting circuit may only exhibit sequences. The resulting circuit may only exhibit equivalence equivalence afterafter an appropriate delay. See [Singhal et.al an appropriate delay. See [Singhal et.al ICCAD 1995]. ICCAD 1995].
This holds even for regular retimingThis holds even for regular retiming
23
Peripheral Retiming - an Peripheral Retiming - an exampleexample
cc
aa bb00 11
00 00
22
ii jj
Peripheral retimingPeripheral retiming
Not possible for all circuits:Not possible for all circuits:
cc
aa bb11 11
11 22
00
ii jj
ccaa
bb
00ii
jj dd dd
00 00
00 11 00 00
0000
o1o1
o2o2
24
Path Weight Matrix (PWM)Path Weight Matrix (PWM)Matrix WMatrix W
– rows:rows: inputsinputs– columns:columns: outputsoutputs
aa bb11
ii jj kk
11
o1o1 o2o2
o1 o2o1 o2II 1 *1 *jj 0 ~ 0 ~kk * 0 * 0
weightsdifferent withpaths if paths all for weightsame
ji path no
~)(
*
jiij ewW
25
PWM and Peripheral RetimingPWM and Peripheral RetimingSatisfiableSatisfiable path weight matrix: path weight matrix:
1. W1. Wij ij ~, ~, i, i, j andj and
2. 2. ii, , jj, such that W, such that Wijij= = I I + + jj , , WWij ij * *
Peripheral retiming possible Peripheral retiming possible matrix is satisfiable matrix is satisfiable ii, , jj specify registers on peripheral edge specify registers on peripheral edge
– some some ii, , jj can be can be negativenegative
ComplexityComplexity: : linearlinear in size of communication graph in size of communication graph
26
Path Weight Matrix - generationPath Weight Matrix - generationFrom inputs to outputs:From inputs to outputs:
generate output columns of W. generate output columns of W.
Example:Example:
aa bb11
ii jj kk
11
o1o1 o2o2[ * ~ 0 ][ * ~ 0 ]
[ 1 0 * ][ 1 0 * ]
[ 1 0 * ][ 1 0 * ]
[ 0 * * ][ 0 * * ] [ * 0 * ][ * 0 * ] [ * * 0 ][ * * 0 ]
([ 0 * * ]+1) & ([ * 0 * ]+0)([ 0 * * ]+1) & ([ * 0 * ]+0)= [ 1 * * ] & [ * 0 * ]= [ 1 * * ] & [ * 0 * ]= [ 1 0 * ]= [ 1 0 * ]
([ * 0 * ]+0) & ([ * * 0 ]+0)([ * 0 * ]+0) & ([ * * 0 ]+0)= [ * 0 * ] & [ * * 0 ]= [ * 0 * ] & [ * * 0 ]= [ * 0 0 ]= [ * 0 0 ]
( [ * 0 * ]+1) & [ * 0 0 ]( [ * 0 * ]+1) & [ * 0 0 ]= [ * 1 * ] & [ * 0 0 ]= [ * 1 * ] & [ * 0 0 ]= [ * ~ 0 ]= [ * ~ 0 ]
o1 o2o1 o2I I 1 *1 *j j 0 ~ 0 ~kk * 0 * 0
Paths from inputsPaths from inputsto nodeto node
27
Computing Computing , , Each constraint of the form Each constraint of the form I I + + jj = W = Wijij
ProcedureProcedure1.1. set set 11 = 0 = 02.2. use first row to generate use first row to generate ‘s‘s3.3. determine determine ‘s from ‘s from ‘s‘s4.4. check for consistencycheck for consistency
Example:Example:
o o ii 2 2j j 3 3
cc
aa bb1/1/00 1/1/11
1/1/00 2/2/00
0/0/22
ii jj
oo
11 = = 0022 = = 1111 = = 22
28
Computing Computing ,,
Example:Example:
o1 o2 o1 o2 ii 0 0 0 0j j 0 1 0 1
1+ 1+ 1 = 0 1 = 0 2+ 2+ 1 = 0 1 = 0
1+ 1+ 2 = 0 2 = 0 2+ 2+ 2 = 12 = 1------------------------------ ------------------------------
1- 1- 2 = 0 2 = 0 1- 1- 2 = -12 = -1
ccaa
bb
00ii
jj dd dd
00 00
00 11 00 00
0000
o1o1
o2o2
contradictioncontradiction
29
Optimizing Acyclic Sequential Optimizing Acyclic Sequential CircuitsCircuits
Acyclic circuits with Acyclic circuits with satisfiablesatisfiable W W– Do Do peripheralperipheral retiming putting retiming putting ii,,jj registers at the I/O registers at the I/O– ResynthesizeResynthesize interior interior– Do a Do a legallegal retiming retiming (move registers in)(move registers in)
• May May notnot always be possible always be possible
Acyclic circuits with Acyclic circuits with unsatisfiableunsatisfiable W W– IdentifyIdentify maximal sub-circuits with satisfiable W maximal sub-circuits with satisfiable W– CutCut connections connections– RepeatRepeat previous procedure previous procedure
30
Acyclic Sequential CircuitsAcyclic Sequential Circuits
Note that Note that bothboth of the cut circuits can be peripherally of the cut circuits can be peripherally retimed, unlike the original.retimed, unlike the original.
cutcutcircuitscircuits
31
Optimizing Cyclic Sequential Optimizing Cyclic Sequential CircuitsCircuits
Cyclic circuitsCyclic circuits1.1. Make circuit Make circuit acyclicacyclic by breaking cycles by breaking cycles
– Different feedback-cuts give Different feedback-cuts give differentdifferent W’s W’s2.2. Find sub-circuits with Find sub-circuits with satisfiablesatisfiable W’s W’s3.3. RepeatRepeat procedure procedure
32
FSM OptimizationFSM Optimization
AA BB
OUTOUT
BB
OUTOUT
AA
Break feedbackBreak feedback
XX XX
YYYY
XX
YY
33
FSM OptimizationFSM OptimizationPeripheral retimingPeripheral retiming
ResynthesizeResynthesize
BB
OUTOUT
AA
BB
OUTOUT
AA
XX
YY
XX
YY
XX
YY
XXYY
combinationalcombinational
34
FSM OptimizationFSM OptimizationRetimeRetime
ReconnectReconnect
BB
OUTOUT
AA
BB
OUTOUT
AA
XX
YY
XX
YY
XX
YY
35
FSM OptimizationFSM Optimization
Resynthesized circuitResynthesized circuit
BB
OUTOUT
AA
AA BB
OUTOUT
Original circuitOriginal circuit
36
Resynthesis of PipelinesResynthesis of PipelinesGoal:Goal: Performance optimization of pipeline Performance optimization of pipeline
circuitscircuits
Example:Example: Pipeline circuit C Pipeline circuit CCCnn
CCii
CC11
OOnn
OOii
OO11
IInn
IIii
II11
CCnn
CCii
CC11
OOnn
OOii
OO11
IInn
IIii
II11
n -1n -1
i-1i-1
-(n -1)-(n -1)
-(i-1)-(i-1)
PeripheralPeripheralRetimingRetiming
CombinationalCombinationalcircuitcircuit
37
Resynthesis of PipelinesResynthesis of Pipelines
Parameters:Parameters:– AA vector of vector of arrivalarrival times for inputs times for inputs– RR vector of vector of requiredrequired times for outputs times for outputs– cc target target clockclock cycle cycle– CC circuitcircuit
PipelinePipeline performance problem: performance problem: PPPP(C(CPP,c,cPP,A,APP,R,RPP))
CombinationalCombinational performance problem: performance problem:PPCC(C(CCC,c,cCC,A,ACC,R,RCC))
38
Resynthesis of PipelinesResynthesis of Pipelines
Problem Transformation:Problem Transformation:– RelaxRelax PPPP(C(CPP,c,cPP,A,APP,R,RPP) to ) to P P ‘‘PP(C(CPP,c,cpp++,A,APP,R,RPP) )
• where where = largest possible single gate delay = largest possible single gate delay– ConvertConvert: pipeline : pipeline P P ‘‘PP to combinational problem to combinational problem P P ‘‘CC
peripheral retime
( 1)
( 1)
i i
i i
P C
stage stageC P
stage stageC P
C C
R i c R
A i c A
39
Perfomance Synthesis of Perfomance Synthesis of PipelinePipeline
• Arrival and Required Times for Arrival and Required Times for P P ‘‘CC : :CCnn
CCii
CC11
OOnn
OOii
OO11
IInn
IIii
II11
PeripheralPeripheralRetimingRetiming
CCnn
CCii
CC11
OOnn
OOii
OO11
IInn
IIii
II11
(i-1)c+A(i-1)c+Aii
CombinationalCombinationalcircuitcircuit
(i-1)c+R(i-1)c+Rii
Theoretical ContributionTheoretical Contribution::– If If P P ‘‘CC((ccpp++)) has a solution then retiming yields a solution to has a solution then retiming yields a solution to P P ‘‘P P ((ccpp++)) – If there is a solution to If there is a solution to PPP P ((ccpp)) then peripheral retiming yields a solution to then peripheral retiming yields a solution to P P
‘‘C C ((ccpp++))