designofimprovedparallelselftimedadderforhigh...

ISSN: 2278 – 909XInternational Journal of Advanced Research in Electronics and Communication Engineering (IJARECE)

Volume 8, Issue 10, October 2019

All Rights Reserved © 2019 IJARECE 149

Design of Improved Parallel Self Timed Adder for High-Speed and Low-Power DSP Circuit Applications

K.V.Ganesh1 V. Malleswara Rao 2

1Ph.D Scholar, Dept of ECE, GITAM University, and Assistant Professor, VITAM College of Engineering,Visakhapatnam, Andhra Pradesh, India.

Professor, Dept of ECE, GITAM University, Visakhapatnam, Andhra Pradesh, India

Abstract: In this paper, an improved parallel self-timed adder (IPASTA) is designed for high-speed and low-power digital signal processing (DSP) circuit applications. The multi-bit arithmetic additions are performedbased on the recursive formula using sum and carry throughputs. The conventional adders such as ripple carryand carry look ahead adders suffers with high circuit complexity and carry propagation delay that degrades theentire integrated circuit performance. In order to avoid these issues, we have proposed an IPASTA, which isfaster and less complex in design. Moreover, a practical implementation of multiplexer, half adder and completedetection unit are designed with minimum number of transistors than the conventional method. The circuitsimulations are carried out using the industry standard Cadence tool with 90-nm technology node. Thecomparisons are also performed with the existing adder circuits. It has been observed that using the proposedmodel the speed is increased by 29% and area is also reduced by 30% than the conventional approach. Moreover,the proposed architecture significantly decreases the total power dissipation and chip area in digital signalprocessor using folded tree topology. The total power saving is observed to be 20% compared with the existingarchitecture. Additionally, fan-in and fan-out issues and gate driving capabilities are also improved in theproposed IPASTA.Keywords: Binary adders, Parallel Self Timed Adders (PASTA), Very Large Scale Integration (VLSI),digital signal processing (DSP) Integrated circuit design.

1. IntroductionIn processor unit, binary addition is one of

the important operations for signal processing.Although there are some asynchronous circuitswhich don’t depends on clock signal, most ofthe adder circuits are implemented for clockdependent synchronous designs [1]. Timequantization is not an issue in asynchronouscircuit; thus it is free from timing issues. Insteadof clock signal, asynchronous circuitscommunicate with request-acknowledgehandshaking signals to maintain pipelining. Foradders, handshaking blocks are expensive toimplement. Therefore, its implementation isefficiently done by dual-rail carry propagation,which provides acknowledgement from adderblock with the help of carry output. Thus theasynchronous adders are implemented eitherdual-rail encoding of all signals or single railencoding with pipelining operation [2].

On considering the above constraints,asynchronous parallel self-timed adder (PASTA)is proposed in [3]. The design contains: halfadders, multiplexers and other control unit thatrequire minimum interconnects, which was

highly preferred in VLSI implementation. Toconstruct an asynchronous sequential cyclicadder with single-rail, a feedback signal throughXOR gate is required [4] and these are moreefficient than acyclic adders [5, 6]. In maximumrate pipelining, inputs are pipelined beforeoutputs will stabilize [7-10]. In the proposedwork, signal-rail pipeline separates carry inputfrom propagation and gate delays in closed path.Thus, it is a single-rail pipelined approach,which is different from ripple adders using dual-rail encoding.

In digital signal processing, the powerdissipation is a primary design parameter forcircuit implementation. Accordingly, in manyelectronic gadgets, efficiency of the battery isthe important performance parameter. The basicprinciple operation in DSP architecture isaddition, where it further used to developcomplex circuits like multipliers, memorydecoding and arithmetic logical operations. Ingeneral, the conventional adders such as ripplecarry and carry look ahead adder are replaced byparallel prefix adder. In order to optimize thepower consumption in that block, we haveintroduced a folded tree topology using the




proposed improved parallel self-timed adder(IPASTA). The proposed architecture can beused to design the DSP systems for high-speedand low-power applications.

This paper is organized as follows: Theobjective of the paper and motivation behindthis work is detailed in Section 1. The proposedsystem architecture and operation of self-timedadder is explained in Section 2. The simulationresults of the proposed adder and thecomparison with the conventional systems areplaced in Section 3. Additionally, the analysis ofDSP architecture with folded tree topology isalso performed in Section 4. Finally, theconclusions are drawn in section 5.

2.Architecture Of Parallel Self TimedAdder (PASTA)

The basic building blocks of IPASTA arehalf adder, multiplexer and complete detectionunit and the architecture of PASTA is shown inFig 1. In the operation of PASTA, the primarywork will starts with half adders, where itaccepts two input binary steams and generaterelevant sum and carry outputs using its iterativealgorithms and ends as if generated carry is zero.

Figure 1. Module level architecture of Parallel self-timed adder.

The input binary stream flow and outputsum and carry propagation is illustrated below:(i) Based on selected line SEL, which is

common to all multiplexers, data will enterinto multiplexer. As SEL =0, binary data willenter into multiplexers and provides sum andcarry outputs. For SEL=1, the feedbacksignal of present sum and previous carry willselect and gives next sum and carry withinthe transition of 0 to 1.

(ii)In half adder stage, the output of multiplexeris taken as input to it and processed itsoutputs to complete detection unit. For SEL=0, the adder operation is perform for inputbinary bits and SEL=1, the adder operationperform to present sum and previous carry,provides final sum and carry results.

(iii) The complete detection unit (CDU)contains parallel pseudo NMOS transistor,are helpful to avoid high fan-in requirement.The output of CDU is enable (TERM=1)only if all carry values becomes zero.

2.1Recursive algorithms for binary addition

The recursive formulas for sum and carry ininitial phase and iterative phase are representedas follows:Let qS p and 1

qCP are sum and carry of pth bit atthe qth iteration and for initial phase (q=0), thesum and carry expressions are in (1)

0 01S a b C a bp p p p pp (1)

Then the recursive formula for binary additionof qth iteration is formulated as below

1 1 0q q qS S C p np p p

1 1 01q q qC S C p np pp

(2)The recursive iteration is terminated at ithiteration, when the following condition occurs

01 2 1i i i iC C C Cn n n

(3)

The sum and carry for any number of bits willexecute accurately by using recursive algorithmmention in (1)-(3). The proof for correctness ofiterative algorithm is shown below.Let the carry Cpi+1 = 0 for an ith bit of pthiteration and q is such a bit, which Cpq+1 = 1. In(p+1)th iteration it is shown that it issuccessfully propagated to next higher bit. Instate diagram shown in Fig 2 the pth iteration ofqth bit state (Cpq+1,Spq ) and (q+ 1)th bit state(Cpq+2,Spq+1) goes to any of the states (0, 0), (0,1), or (1, 0). Hence, Cpq+1= 1, implies that




Spq=0 , from carry expression shown in (2)Cp+1q+1 = 0 for any input condition between 0 toq bits. Now, (q+1)th bit of state (Cpq+2,,Spq+1) forpth iteration and the states represented are (0, 0),(0, 1), or (1, 0) states. In (p+1)th iteration, thestates (0, 0) and (1, 0) from the pth iterationproduce output of (0, 1) following (1) and (2).Further for state (0, 1) the carry propagatesthrough bit level satisfies (2). This proof willhelp us to understand the iterative algorithm forbinary addition using state diagram.

2.2Melay’s state diagram for binary addition

The state diagram that follows the initialand iterative phase of PASTA is shown in Fig2a and 2b. In state diagram each state represents(Cp+1 Sp ) where Cp+1, Sp are carry and sum valuesof pth bit adder block. In initial phase, circuitacts as half adder, based on input values the nextsate sum and carry are represented in statediagram in Fig 2a.

(a)

(b)Figure 2. (a) Initial phase operation with state as:(Cp+1 Sp) and transition as: ap bp. and (b) iterative

phase operation with state as: (Cp+1 Sp) and transition

as: Cp.

3.Results and DiscussionsThe following design parameters are

considered while calculating the performancefactors: width of each transistor (W) is 120nm,channel length (L) is 45nm, the circuit isimplemented using Cadence-Virtuoso tool at45nm technology node.

In IPASTA circuit, the multiplexerblock shown in Fig 3, the selected operation ofmultiplexer is implemented using pass transistorlogic instead of CMOS. As a result the gatecount, power dissipation and chip area reduceswithout affecting the logical operation and thesimulation results are shown in Fig 3, justifiesthe correctness of the circuit.

Further, the outputs of multiplexer areprovided to half adder circuits, where the actualaddition operation performs. For SEL=0, inputsto all half adders will enter into the circuit andprovides sum and carry result, treated as presentstage sum and carry. Then, for SEL= 1, thepresent carry adds up with sum of next stage,results final sum at that half adder stage.

(a) (b)Figure. 3 (a) Multiplexer circuit (b) simulated wave

forms

Figure. 4 (a) Proposed half adder circuit and (b) carrygenerator circuit.




(a)

(b)Figure. 5 Simulation results of (a) half adder and (b)

carry generator circuit.

(a)

(b)

(c)Figure. 6 Improved parallel self-timed adder (a)

schematic circuit, (b) simulation result and (c) layout.

In adder stage shown in Fig 4(a), passtransistors are used in intermediate stage toremove the high gate count and arearequirement without affecting logical operation.The simulated waveforms for sum and carry forvarious inputs are depicted in Fig 5(a) and 5(b).The complete detection unit is designed as it ispresented in PASTA circuit, it will helpful interminating iterations and provides final carryand sum outputs, using sum of carry expressionin (3). The complete design of IPASTA isdepicted in Fig 6(a), and designed usingoptimum circuitry. The simulation and resultsand layout is in 6(b) and 6(c) using CadenceVirtuoso with 45nm technology node. Layout isperformed is a standard cell environment withtwo metal layers. The power, area and delayanalysis is explained in further sections.

3.1 Comparison of performance parameters

For the performance comparison betweenthe proposed adder and the various existingadders, the fan-out 4 (FO4) load is considered atthe output. The total transistor count obtainedfrom IPASTA circuit is 24 as shown in Fig. 6.

The power dissipation in any device ismainly due to static and dynamic activities. Thestatic power consumed by the device only instable state, whereas dynamic power dissipationoccurs during the transition between two stablestates. Total power dissipation is the algebraicsum of static and dynamic powers, arerepresented as

P P PDynamicStaticTotal (4)

P I VDDStatic Static (5)

2P P C V ftrDynamic L DD (6)

Where IStat is the static current produceddue to leakage and short circuit, and VDD is thesupply voltage. The value of CL depends on thefar-end transistor size, f is operating clock signalfrequency, Ptr is the switching transition fromthe logic levels 0 to VDD. The performance ofthe device can be increased by reducing the CLfactor.




Table 1. Comparison of Power and Area for PASTAand IPASTA in 90nm technology node

Addercircuit

Powerrequirement(90nm)

Timedelay(90nm)

Arearequirement(p sq.meter)

RCA 0.82 mW 2.8 ns 0.8CLA 0.62 mW 1.9 ns 0.5

PASTA 0.09 mW 0.78 ns 0.216IPASTA 0.064 mW 0.52 ns 0.131

From Table 1, the total power dissipation(includes static and dynamic power) of variousadders are reported. The total power dissipatedby IPASTA is 0.064mW, whereas PASTAconsumes 0.09mW. When number of processingelements is increased then power consumed inproposed system is also increased,comparatively. The power saving efficiency ofIPASTA when compared to PASTA is 28%.

In digital systems, other important aspectswhile designing the integrated chip are area andtime delay. From table 1, the time delay forPASTA circuit having delay of 0.018ns andIPASTA circuit contains delay of 0.0126ns.Whereas, the delay for conventional adders suchas ripple carry array (RCA) and carry lookahead (CLA) are 2.8 ns and 1.9 ns, respectively.Therefore, the proposed structure has less delaywhen compare to PASTA circuit. The delaysaving efficiency for the proposed circuit is 30%with respect to existing PASTA. The chip arearequirement for proposed design is 0.131 p sq.meters and for PASTA circuit is 0.216 p sq.meter, the proposed design is implemented withless number of gates than PASTA. Therefore,the area saving efficiency is around 30%.

4.Analysis of DSP Architecture with FoldedTree Topology Using the Proposed IPASTA

In digital signal processing (DSP) unitpower optimization is one of the importantdesign criteria. For the application like WirelessSensor Networks, mobile communication powerbudget is a key aspect [11]. DSP structureincludes core, processor and board level [12].The major tasks are performed in core partwhich is a sub-part in processor section.Therefore, most of applications are implementedby using DSP processors due to its better

performance and cheaper in price than Analogsignal processors.

Some signal processing algorithm likeKalman filtering, Coherent beam focusingrequire DSP architecture. These applicationsrequires high energy for processing. Therefore,it is needed to scale down the energyrequirement by replacing power efficient addercircuits in Array of logical units shown in Fig 7,to fulfill this it is required to replace IPASTAcircuits in DSP processors, then the battery lifetime will improve a lot and meet therequirements of low power VLSI circuits. Themain blocks present in DSP architecture areArray of logical units, finite state machine(FSM), counter and memory. In counter wehave the information regarding number of timelogical units are accessed. FSM is used to startand terminate counter based on instructionsreceived. Data is temporarily stored in bufferuntil previous data is processed. Addressdecoders are used to choose next data to be sentto logical units. In DSP architecture based onthe application, the array of logical unit willchange.

Figure. 7 DSP architecture using folded tree topologyand array of logical units contains IPASTA circuit.

In this study, folded tree method withparallel prefix operations are used to reduce chiparea in terms of folding the array of logical unitsshown in Fig 8. Where Fig 8(a) shows some ofthe arithmetic operations that are involved inone of the logical units, but the usage ofaddition and subtraction in multiple times,however it is redundant. Hence, we appliedfolded tree algorithm to reduce redundantoperation and interconnect requirement to avoid




high chip area and high power requirementshown in Fig 8(b).

(a)

(b)Figure. 8 (a) Logical unit performs arithmeticoperation and (b) logical unit using folded tree

topology.Using Cadence Virtuoso 90nm

technology node, the proposed architecture issimulated using folded tree algorithm andlogical units are replaced by IPASTA. Foldedtree algorithm will folds the architecture, resultsin area and power requirement is low. Further,IPASTA replacement results in less power,more speed and less area requirement. Thepower and area calculations of DSP with carrylook ahead adder (CLA), PASTA and IPASTAshown in Fig 9.

(a)

(b)Figure. 9 Application of folded tree algorithm to DSPwith carry look ahead adder (CLA), PASTA and

IPASTA (a) variation in power requirement and (b)variation in area requirement.

From Fig 9(a) and 9(b), the digitalsignal processors power is reduced by usingproposed IPASTA in array of logical units.Total logical element count with CLA is 133(48processor elements and 85 adder elements), forIPASTA it is 71(23 adder elements and 48processor elements). By using Table 1, thepower required by DSP with CLA is calculatedas 97mW, with IPASTA is calculated as87.47mW, noticed in Fig 9(a). Therefore, thepercentage power saving efficiency using foldedtree and IPASTA circuit in DSP is around 10% .Further, on applying folded tree architecture toDSP, the area requirement by using differentadder circuits are depicted in Fig 9(b) and thepercentage area reduced using IPASTAcompared to CLA and PASTA is 50% and16.67%, respectively. Thus, in DSP the area andpower requirements are monotonicallydecreased using folded tree algorithm andIPASTA adder incorporation.

5.ConclusionsImproved PASTA circuit is designed with

less area and high speed, will works in a parallelfashion for independent carry chains. It achievesbetter performance than conventional RCA,CLA and PASTA circuit in terms powerdissipation, time delay, gate count and chip area.The optimized gate count obtained fromIPASTA circuit is 24 transistors, results the chiparea requirement for proposed design is 0.131 psq. meters and percentage area saved is around




30.4%. Power requirement is 0.064mW andpercentage reduction is 28%, Time delay is0.0126ns and the percentage delay reduced by30% than the conventional PASTA circuit.Additionally, the proposed adder is alsoanalyzed for DSP architecture with folded treetopology. It has been noticed that the chip areais reduced by reconfiguring processing elementin folded tree architecture. Thus, with thisenhanced performance of IPASTA circuitcompare to other adders, it can be used in highspeed and low power applications.

References[1] M. Z. Rahman, L. Kleeman and M. A. Habib,"Recursive Approach to the Design of a ParallelSelf-Timed Adder," in IEEE Transactions onVery Large Scale Integration (VLSI) Systems,vol. 23, no. 1, pp. 213-217, Jan. 2015

[2] J. Sparsø and S. Furber, Principles ofAsynchronous Circuit Design. Boston, MA,USA: Kluwer Academic, 2001.

[3] S. Ghosh and K. Roy, "Novel Low OverheadPost-Silicon Self-Correction Technique forParallel Prefix Adders Using SelectiveRedundancy and Adaptive Clocking," in IEEETransactions on Very Large Scale Integration(VLSI) Systems, vol. 19, no. 8, pp. 1504-1507,Aug. 2011.

[4] M. Z. Rahman and L. Kleeman, “A delaymatched approach for the design ofasynchronous sequential circuits,” Dept.Comput. Syst.Technol., Univ. Malaya, KualaLumpur, Malaysia, Tech. Rep. 05042013,2013.

[5] M. D. Riedel, “Cyclic combinationalcircuits,” Ph.D. dissertation,Dept. Comput. Sci.,california Inst. Technol., Pasadena, CA, USA,May 2004.

[6] X. Xu and Y. Hong, "Matrix Approach toModel Matching of Asynchronous SequentialMachines," in IEEE Transactions on AutomaticControl, vol. 58, no. 11, pp. 2974-2979, Nov.2013.

[7] W. Liu, C. T. Gray, D. Fan, and W. J.Farlow, “A 250-MHz wave pipelined adder in2-μm CMOS,” IEEE J. Solid-State Circuits, vol.29,no. 9, pp. 1117–1128, Sep. 1994.

[8] F.-C. Cheng, S. H. Unger, and M. Theobald,“Self-timed carrylookahead adders,” IEEETrans. Comput., vol. 49, no. 7, pp. 659–672, Jul.2000.

[9] S. M. Nowick, "Design of a low-latencyasynchronous adder using speculativecompletion," in IEE Proceedings - Computersand Digital Techniques, vol. 143, no. 5, pp. 301-307, Sep 1996.

[10] N. Weste and D. Harris, CMOS VLSIDesign: A Circuits and Systems Perspective.Reading, MA, USA: Addison-Wesley, 2005

[11] A. Wang and A. Chandrakasan, "Energy-efficient DSPs for wireless sensor networks," inIEEE Signal Processing Magazine, vol. 19, no.4, pp. 68-78, Jul 2002.

[12] Steven W. Smith, The scientist andengineer's guide to digital signal processing. 2nded., California Technical Publishing, 1999.

K.V.GANESH received his B.Edegree in Electronics andCommunication Engineeringfrom Andhra university in theyear 2009 and receivedM.Tech degree in the year2011 from JNT

University,Kakinada.He is a Ph.D scholar inGITAM Institute of Technology, GITAMUniversity, Visakhapatnam,India.His researchactivities are related to Low Power VLSIDesign.

DR.V.MALLESWARA RAO receivedhis B.E degree in Electronics andCommunication Engineering fromAndhra University in the year 1985and received M.E degree in theyear 1989 from Andhra University

and completed his Ph.D from J.N.T.UKakinada,India and working in GITAM Instituteof Technology, GITAM University,Visakhapatnam as Professor and H.O.D. He is alife member of AMIE. His research activities arerelated to Low Power VLSI Design ,Microwave,Bio-Signal Processing.

designofimprovedparallelselftimedadderforhigh...

Documents