optimal sparseness in binary adderarith22.gforge.inria.fr/slides/02-aktan.pdf•for 64-bit ks ling...

Optimal Sparseness in Binary Adders

ARITH 22 Lyon, France

2015

Outline• Parallel Adders

– Structural features– Recurrence algorithms

• Weinberger• Ling

– Minimum depth structures• Kogge-Stone• Ladner-Fischer

• Sparse Adders– Sparse adders in literature

• Energy Optimal Sparseness– Limits on sparseness– Effect of increased sparseness on adder energy

• Implementation results• Conclusion

Parallel Adder Structure

Structural Features of Parallel Adders

• Logic Depth (LD): maximum number of stages from input to output

• Prefix (P): number of signals (or maximum fan-in) processed at each stage. – Prefix 2 means two signals are processed in a node. – Logical depth changes depending on the prefix.

• minimum possible number of stages = logRN (N-bit adder, prefix R). – For N=64 : LDmin = 6 for prefix 2, LDmin = 3 for prefix 4.

• Fan-out (F): The maximum number of logical branching in the prefix tree.

• Wiring Complexity (WC): The maximum number of wire tracks passing along a bit-pitch of the technology in any stage of the prefix tree.

Recurrence AlgorithmsWeinberger Ling

Minimum Depth Adders

Kogge-Stone Ladner-Fisher

P.M. Kogge and H.S. Stone, “A parallel algorithm for the efficient solution of a general class of recurrence equations”, IEEE Trans. Computers Vol. C-22, No. 8, Aug. 1973, pp.786-793.

R.E. Ladner, M.J. Fischer; ‘Parallel Prefix Computation’ JACM, 27(4):831-838, Oct. 1980.

- Minimum depth (log2N)- Minimum fanout (2)- Maximum wiring (N/2)

- Minimum depth (log2N)- Maximum fanout (N/2)- Minimum wiring (1)

SPARSE ADDER

Sparse Adder Structure• Critical path in prefix adder

– Sum block: 1 gate– Carry block: 1+log2N gates

• Cannot reduce critical path length beyond log2N, however can move complexity to less critical sum block.

• Solution: Sparse adder– Generate every Mth carry signal– Pre-compute sum signals for

missing carry signals– Select true sum signal based on

computed carry signals• Dilutes carry block, complicates

sum block• Saves area, power without

changing critical path length

Prefix Graphs for Sparse Adders

SPARSE ADDERS IN LITERATURE

Conditional Sum (COS) Adder

Sklansky, J.; , "Conditional-Sum Addition Logic," Electronic Computers, IRE Transactions on , vol.EC-9, no.2, pp.226-231, June 1960.

32-bit prefix 2 COS adder prefix scheme.

Carry Select (CSL) Adder

Bedrij, O. J.; , "Carry-Select Adder," Electronic Computers, IRE Transactions on , vol.EC-11, no.0, pp.340-346, June 1962.

64-bit prefix 4 sparse 4 CSL adder prefix scheme.

Sparse Adder [Mathew, 2003]

Mathew, S.; Anders, M.; Krishnamurthy, R.K.; Borkar, S.; , "A 4-GHz 130-nm address generation unit with 32-bit sparse-tree adder core," Solid-State Circuits, IEEE Journal of , vol.38, no.5, pp. 689- 695, May 2003.

32-bit prefix 2 sparse 4 LF prefix scheme Weinberger adder

ENERGY OPTIMAL SPARSENESS

Carry Tree Sparseness

• Sparse carry trees reduce energy in parallel adders

• Energy improvement is due to the complexity reduction of the carry path by reduced wiring and number of gates.

• A certain amount of complexity is moved to the sum path implying a limit on the sparseness of the carry tree.

Carry Tree Sparseness cont.• Making the carry tree sparse does not change the

critical path length of the carry block.• However, increases the critical path length for the sum

block.• Critical path length of carry block for an N-bit Ling

adder using prefix 2 computations islog2N

• A sparse M adder uses M-bit parallel adders in the sum block to compute conditional sum signals

• Hence, critical path length for sum block is 2+log2M

Limit on Sparseness• Weinberger recurrence

– Carry critical path: 1+log2N– Sum critical path: 2+log2M

2+log2M ≤ 1 + log2N ⇒ M ≤ N/2

• Ling recurrence– Carry critical path: log2N– Sum critical path: 2+log2M

2+log2M ≤ log2N ⇒ M ≤ N/4

SUM PATH DESIGN IN A SPARSE ADDER

Sum Path

Weinberger Ling

ci = ti−1 hi−1

RCA vs PPA in Partial Sum Computation

RCA (Ripple Carry Adder) PPA (Parallel Prefix Adder)

Depth = 5

Depth = 4

RCA vs PPA: Critical path length

Degree of Sparseness (M)

Ripple carry (1+M)

Parallel prefix (2+log2M)

2 3 3

4 5 4

8 9 5

16 17 6

8-bit Partial Sum Computation usingPPA Structure

EFFECT OF INCREASED SPARSENESSTheoretical results

Total gate count

-Gate counts are equal for KS and LF adders.

Total Gate Complexity

- Complexity for a gate is defined as the number of inputs (for inverter 1, two-input nand 2, etc.) - For KS sparse 4 gives least complexity for 32 to 256 bit adders.- For LF sparse 2 gives least complexity for 32 and 64, and sparse 4 for 128 and 256 bit adders.

Normalized Gate Complexity

- Complexities are normalized to their full carry tree (sparseness 1) complexities.- For KS sparseness achieves 30% reduction in complexity.- For LF sparseness achieves 20% reduction in complexity.

Total Wire Complexity

- Wire complexity is defined as the total wire length (e.g. a wire from bit 32 to 64 will have a length of 32 units).- For KS complexity reduces as sparseness increases.- For LF wire cmplx. optimum sparseness is 2 for 32 and 64 bit, and 4 for 128 and 256 bit adders.

Normalized Wire Complexity

- Complexities are normalized to their full carry tree (sparseness 1) complexities.- For KS sparseness achieves 80% reduction in complexity.- For LF sparseness achieves 20% reduction in complexity.

Theoretical Results• For 64-bit LF adders, sparse 2 yields both minimum gate complexity and

total wire length – It must be noted that the reduction in gate complexity in LF adder is due to

removal of buffers as opposed to the more complex AND-OR gates in KS adder. – Hence, the improvement in gate complexity for LF adder is smaller compared

to the improvement in KS adder. – The increase in gate complexity beyond sparse 8 in KS adder will circumvent

energy savings achieved through reduced wiring complexity. • Energy optimum sparseness degree will be determined by the gate

capacitance to the wire capacitance ratio. – For low performance design region, gate sizes are small hence wire

capacitances will dominate and KS sparse 8 is expected to outperform KS sparse 4 in terms of energy at same performance.

– For LF adder on the other hand, it is not worth going beyond sparse 4 due to increased complexity in both measures.

• For 128- and 256-bit adders sparse 4 yields the most savings for both KS and LF structures.

RESULTS

Technology

Technology• 45nm TSMC• VDD= 1.1V• Temp = 25`C• Typical process corner• Multi-Vth standard cell

library (low, standard, high)

STDCELL Library

Gate Available Strength

AOI21 1x,2x,4x,6x,8x

AOI22 1x,2x,4x,6x,8x

INV 1x,2x,4x,6x,8x,12x,16x,32x

NAND2 1x,2x,4x,6x,8x

NOR2 1x,2x,4x,6x,8x

OAI21 1x,2x,4x,6x,8x

OAI22 1x,2x,4x,6x,8x

Design Environment• Designed adders

– KS adder w/ full, sparse 2, sparse 4, and sparse 8 carry trees

– LF adder w/ full, sparse 2, sparse 4, and sparse 8 carry trees

• Circuit sizing using Design Compiler

• Placement and routing using Encounter

• Post layout simulations using Primetime

• Input driver: 16x inverter• Output load: 16x inverter• 25% activity at inputs• Adders designed for

minimum energy using delay targets between 300ps to 400ps.

Energy-Delay

Leakage Power

Wire Energy

Conclusion• Energy savings of 50% and 22%, and leakage power savings

of 70% and 40% are achieved with increased sparseness degree of carry trees for KS and LF adders, respectively.

• For 64-bit KS Ling adder, energy optimal sparseness is 4. For 64-bit LF Ling adder, energy optimal sparseness is 2.

• Both optimal KS and LF adders reach the same minimum delay target of 300ps.

• Experimental results suggest that LF S2 is 7% more energy efficient than KS S4 at minimum delay point.

• Theoretical results suggest that sparse 4 carry tree should be used for both KS and LF adders of sizes 128-bit and above.

THANK YOU …Questions?

optimal sparseness in binary adderarith22.gforge.inria.fr/slides/02-aktan.pdf•for 64-bit ks ling...

Documents