continuous retiming

Continuous RetimingContinuous Retiming

EECS 290A EECS 290A Sequential Logic Synthesis and VerificationSequential Logic Synthesis and Verification

OutlineOutline MotivationMotivation Classical retimingClassical retiming Continuous retimingContinuous retiming Experimental comparisonExperimental comparison

MotivationMotivation

Retiming can reduce the clock cycle of the circuitRetiming can reduce the clock cycle of the circuit

Critical path has delay 4 Critical paths have delay 2

Motivation (cont.)Motivation (cont.)

Previous algorithms for retiming require Previous algorithms for retiming require Computing latch-to-latch delaysComputing latch-to-latch delays Solving an ILP problemSolving an ILP problem

The goal is to develop a more efficient algorithm The goal is to develop a more efficient algorithm that works directly on the circuit without ILPthat works directly on the circuit without ILP

Classical FormulationClassical Formulation During retiming the registers are moved over combinational nodes: During retiming the registers are moved over combinational nodes:

wwrr(e(euuvv) = r(v) + w(e) = r(v) + w(euuvv) – r(u), ) – r(u), where where r(v), r(v), the retiming lags, are the the retiming lags, are the

number of registers moved from the outputs to the inputs of number of registers moved from the outputs to the inputs of vv.. For each path For each path pp: : uuvv we define its weight we define its weight w(p) w(p) as the sum total of as the sum total of

registers on all edges.registers on all edges. The minimum clock period stands for the maximum 0-weight path The minimum clock period stands for the maximum 0-weight path

P = max P = max p: w(p) = 0p: w(p) = 0 {d(p)} {d(p)}

Matrices Matrices W(u,v)W(u,v) and and D(u,v)D(u,v) are defined for all pairs of vertices that are defined for all pairs of vertices that are connected by a path that does not go through the host node are connected by a path that does not go through the host node W(u,v) = min W(u,v) = min p: up: uvv {w(p)} {w(p)} andand D(u,v) = max D(u,v) = max p: up: uv and w(p)= W(u,v)v and w(p)= W(u,v) {d(p)} {d(p)}

C. E. Leiserson and J. B. Saxe. Retiming synchronous circuitry, Algorithmica, 1991, vol. 6, pp. 5-35.

Classical Formulation (cont.)Classical Formulation (cont.) W(u,v)W(u,v) denotes the minimum latency, in clock cycles, for the data denotes the minimum latency, in clock cycles, for the data

flowing from flowing from uu to to vv D(u,v)D(u,v) gives the maximum delay from gives the maximum delay from uu to to vv over all path with the over all path with the

minimum latencyminimum latency

The computation of retiming labels for the clock period The computation of retiming labels for the clock period P P is is performed by solving a Linear Programming problem:performed by solving a Linear Programming problem:

r(u) – r(v) r(u) – r(v) w(e w(euuvv), ), eeuuvv E E

r(u) – r(v) r(u) – r(v) W(u,v) – 1, W(u,v) – 1, D(u,v) > P D(u,v) > P

The constraints ensure that after retimingThe constraints ensure that after retiming the latency of each edge is non-negativethe latency of each edge is non-negative each path whose delay is larger than the clock period has at least one each path whose delay is larger than the clock period has at least one

register on itregister on it

Implementations of RetimingImplementations of Retiming

Leiserson/SaxeLeiserson/Saxe compute the matrices, generate compute the matrices, generate constraints, and then solve the LP problemconstraints, and then solve the LP problem

Shenoy/RudellShenoy/Rudell compute the matrix one column at a time compute the matrix one column at a time Reduced space requirements, still prohibitive runtimeReduced space requirements, still prohibitive runtime

SapatnekarSapatnekar proposed a way of utilizing retiming/skew proposed a way of utilizing retiming/skew equivalence to reduce the number of constraints equivalence to reduce the number of constraints generatedgenerated

S. S. Sapatnekar, R. B. Deokar, “Utilizing the retiming-skew equivalence in a practical algorithms for retiming large circuits”, IEEE Trans. CAD, vol. 15(10), Oct.1996, pp. 1237-1248.

Sapatenekar’s Retiming AlgorithmSapatenekar’s Retiming Algorithm

Find ASAP and ALAP skews for a feasible clock periodFind ASAP and ALAP skews for a feasible clock period Use binary search to find a feasible clock periodUse binary search to find a feasible clock period

Perform min-delay retiming by moving latched to fit the timing Perform min-delay retiming by moving latched to fit the timing windowwindow

Perform min-area retiming under delay constraints by solving a Perform min-area retiming under delay constraints by solving a reduced LP problemreduced LP problem

The reduced set of constraints is generated using the skewsThe reduced set of constraints is generated using the skews The LP problem is solved efficiently using a variation of network simplex The LP problem is solved efficiently using a variation of network simplex

methodmethod

ImprovementImprovement: Start by finding maximum ration using Howard’s : Start by finding maximum ration using Howard’s algorithmalgorithm

Pan’s AlgorithmPan’s Algorithm

DefinitionsDefinitions Pseudo-codePseudo-code ConvergenceConvergence ImprovementsImprovements ExperimentsExperiments

DefinitionsDefinitions

A circuit is an edge-weighted, node-weighted A circuit is an edge-weighted, node-weighted directed graphdirected graph Weight of a node, Weight of a node, d(v),d(v), is its combinational delay is its combinational delay Weight of an edge, Weight of an edge, w(e),w(e), is its number of FFs is its number of FFs

Continuous retiming is a retiming, in which the Continuous retiming is a retiming, in which the number of latches retimed is a continuous value number of latches retimed is a continuous value (rather than an integer)(rather than an integer)

The retiming value is computed as before: The retiming value is computed as before: wwrr(e(euuvv) = s(v) + w(e) = s(v) + w(euuvv) – s(u),) – s(u), where s(v) are where s(v) are the continuous retiming lags.the continuous retiming lags.

DefinitionsDefinitions

DefinitionDefinition.. A circuit is A circuit is retimedretimed to a clock period to a clock period by a by a retiming retiming rr if the following two conditions are satisfied: if the following two conditions are satisfied: (1) (1) wwrr(e) (e) 0 0 and (2) and (2) wwrr(p) (p) 1 1 for each path for each path pp such such

that that d(p) d(p) .. DefinitionDefinition.. A circuit is A circuit is c-retimedc-retimed to a clock period of to a clock period of

by a c-retiming by a c-retiming ss if if wwss(e) (e) d(v) / d(v) / for each edge for each edge u u v v..

Definition of c-retiming enforces Definition of c-retiming enforces non-negative edge weightsnon-negative edge weights if if d(ud(u11) – d(u) – d(u22) ) , , thenthen w wss(p) (p) 1. 1.

Pseudo-codePseudo-code

for each node for each node vv in in NN do do if (if (vv is a PI) is a PI) s(v)s(v) = 0; = 0; else else s(v)s(v) = - = -;;for each for each i = 0i = 0 to to |U| + 2|U| + 2 donedone = true; = true; for each non-PI node for each non-PI node vjvj in in NN do do tmp = maxtmp = maxe: u e: u vj vj { s(u) – w(e) + d(v{ s(u) – w(e) + d(vjj) / ) / } } if ( if ( vvjj is a PO and is a PO and tmp > 1tmp > 1 ) return failure; ) return failure; if (if (s(vs(vjj) < tmp) < tmp ) ) s(vs(vjj) = tmp) = tmp; ; donedone = false; = false; if (if (donedone == true ) == true ) return success; // c-retiming reached a fixed pointreturn success; // c-retiming reached a fixed pointreturn failure; return failure;

ConvergenceConvergence

Theorem.Theorem. If the nodes are relaxed according to If the nodes are relaxed according to the topological order, the algorithm stops in at the topological order, the algorithm stops in at most most |U| + 1|U| + 1 relaxation iterations if there is no relaxation iterations if there is no positive cycle, where U is a cut which breaks all positive cycle, where U is a cut which breaks all the loops.the loops.

Reduction to Classical RetimingReduction to Classical Retiming Let Let ss be a c-retiming that achieves clock period be a c-retiming that achieves clock period . .

Let Let rr be the retiming defined as follows: be the retiming defined as follows:

Then Then rr can achieve a clock period less than can achieve a clock period less than + D + D where where DD is the largest combinational delay of a is the largest combinational delay of a node.node.

0 is a PI or PO( )

( ) 1 otherwise

vr v

s v

Area MinimizationArea Minimization The problem of minimizing the amount of The problem of minimizing the amount of

(fractional) FFs subject to a given clock period (fractional) FFs subject to a given clock period is a LP:is a LP:

minimize[ minimize[ ccwwss(e) ](e) ] subject to subject to wwss(e) (e) d(v) / d(v) / for each for each u u v v..

The dual of this problem is an uncapacitated The dual of this problem is an uncapacitated min-cost flow problemmin-cost flow problem The flow graph is a networkThe flow graph is a network The flow out of each node is difference between its The flow out of each node is difference between its

fanout count and fanin countfanout count and fanin count The cost of an edge is The cost of an edge is ww11(e) = - w(e) + d(v) / (e) = - w(e) + d(v) /

ImprovementsImprovements

Perform a “required time” c-retimingPerform a “required time” c-retiming In addition to the “arrival time” c-retimingIn addition to the “arrival time” c-retiming

Retime over circuits with choice nodesRetime over circuits with choice nodes Combines logic synthesis and c-retimingCombines logic synthesis and c-retiming

Heuristically minimize areaHeuristically minimize area Leads to faster computation than solving ILPLeads to faster computation than solving ILP

Experimental ResultsExperimental Results

Comparing the following three algorithmsComparing the following three algorithms P. Pan (ICCD ’96)P. Pan (ICCD ’96) Sapatnekar/Deokar (TCAD ’96)Sapatnekar/Deokar (TCAD ’96) Maheshwari/Sapatnekar (TVLSI ’98)Maheshwari/Sapatnekar (TVLSI ’98)

P. Pan (ICCD’96)P. Pan (ICCD’96)

CPU time is measured on Sparc 5

Sapatnekar/Deokar (TCAD ’96)Sapatnekar/Deokar (TCAD ’96)

CPU time is measured on HP 735 workstation

Maheshwari/Sapatnekar (TVLSI ’98)Maheshwari/Sapatnekar (TVLSI ’98)

CPU time is measured on DEC AXP system 3000/900 workstation

ConclusionsConclusions

Presented an alternative approach to retimingPresented an alternative approach to retiming Compared it with other methodsCompared it with other methods Proposed several improvementsProposed several improvements

continuous retiming

Documents

delay retiming

retiming r

continuous retiming

clock period p

retimingskew equivalence

integerthe retiming

inputs of v

delay constraints