implementation of power efficient 12-bit vedic multiplier …

28

Int. J. Elec&Electr.Eng&Telecoms. 2016 Bala Kiran P and Padma Priya K, 2016

IMPLEMENTATION OF POWER EFFICIENT12-BIT VEDIC MULTIPLIER BY USING ANT

ARCHITECTUREBala Kiran P1* and Padma Priya K2

*Corresponding Author: Bala Kiran P,[email protected]

In this paper, we propose a new paradigm that is a power efficient and low area Vedic multiplierto meet the demand of high precision and low power integrating with the Algorithmic NoiseTolerant (ANT) architecture. The truncation errors can be decreased by implementing ancientVedic multiplication techniques combined with modern probability and statistics method. Thehardware complexity of multiplier is greatly simplified. In this 12*12 bit Vedic multiplier, the effectivetotal on-chip power can be decreased by 15% and the circuit area in our proposed design islowered by 25% compared with existing fixed width RPR multiplier. Hence the power is efficientlyused and implemented.

Keywords: Vedic multiplier, ANT architecture, Truncation errors

INRODUCTIONThe need of ultra low power consumptionmeets the demand of using portable electronicdevices and Digital Signal Processors (DSP).Researchers demonstrated many low-powertechniques to reduce the power dissipation.Supply voltage scaling is the best method toadopt in deep sub-micrometer technologiesincluding CMOS technologies. In generalpower consumption of any electronic circuit isproportional to square of supply voltage (TheInternational Technology Roadmap forSemiconductors, 2009). But the method of

ISSN 2319 – 2518 www.ijeetc.comVol. 5, No. 4, October 2016

© 2016 IJEETC. All Rights Reserved

Int. J. Elec&Electr.Eng&Telecoms. 2016

1 PG Scholar, Department of ECE, JNTUK, Kakinada, India.2 Professor, JNTUK, Kakinada, India.

Voltage Over Scaling (VOS) (Hedge andShanbhag, 1999) leads to degeneration ofnoise (S/N). The complex logic sequentialoperations are performed in DSPs likeaddition, multiplication etc. One of the mostcomplex operations is array multiplication.Since the size of array increases, the demandof power consumption also increasesdrastically. So in order to compensate thesefactors, A novel approach of Algorithmic NoiseTolerant (ANT) technique (Shim et al., 2004)adjoined with voltage over scaling in the mainblock with Reduced Precision Replica (RPR)

Research Paper

29


to minimize the errors and also achieves theamount of energy saving.

ANT architecturehas preferred because ithas the main advantage of lowering thetruncation of soft errors without compromisingthe area of the circuit, power consumption andsupply voltage scaling. Instead of using full-widthrpr (Kidambi et al., 1996), the fixed width rpr isdesigned in the previous technique with the useof probability, statistics and partial product termanalysis to find accurate compensation vectorfor efficient RPR design. It gives exact andefficient results but still the problem of an areaoverhead. So we implement a 12*12 ANTarchitecture is designed and implemented withancient but modified Vedic multiplier(Premananda et al., 2013). So the circuit canbe made easier and the critical path delayimproved without compromising the chip areaand power consumption.

In general the speed of the multipliers isconstrained by the speed of the adders usedfor partial product terms weight. The partialproduct terms are realized by using carry skiptechnique. Different algorithms are used inVedic mathematics for multiplication. One ofthe best and basic techniques is UrdhvaTiryagbhyam (Premananda et al., 2013). It isalso known as Vertical-Cross algorithm. Thismethod is used in various branches ofengineering for computation and signalprocessing (digital) techniques.The Vedicmultiplier is integrated with the ANTarchitecture and calculated the error precisionand the amount of power is greatly reduced.

EXISTED ANTARCHITECTURE DESIGNThe existed ANT architecture consists of Main

Digital Signal Processor (MDSP) block andError Correction (EC) block. VOS (voltageover scaling) method is used in MDSP block.The circuit is shown in fig 1 below. If Tcp is higherthan Tsam of the circuit then soft errors will beoccurred. Where, Tcp is critical path delay andTsam is sampling period. An exact copy ofMDSP but with reduced precision parametersand shorter computation delay is used as EC.Output of MDSP is ya[n] in which there existsan amount of input independent soft errors.Output of RPR block is yr[n]. If Tcp is smallerthan Tsam, yr[n] is applied to detect errors in ya[n].To attain the errors against the threshold thedifference |ya[n] -yr[n] | is considered. If thedifference is larger than Th then y^[n] = yr[n].Otherwise y [̂n] = ya[n].

Th is determined by,

Th = maxvinput |yo[n] – yr[n] ...(1)

where yo[n] is error free output signal. In thistechnique power consumption is lowered butstill maintains SNR degradation. Fixed-widthmultiplier has the advantage of avoid usingincreasing bit width in DSP applicationscompared to full-width multiplier design. Toconstruct a fixed width DSP with n-bit input andn-bit output by truncating n-bit LSB output isbest solution. It results in rounding error. Therounding error can be compensated with theconstant correction value (Lim, 1992; Schulteand Swartzlander, 1993; and Jou and Wang,2000) rather than variable correction value(Curticapean and Niittylahti, 2001; Strolloet al., 2005; Kuang and Wang, 2006; Petraet al., 2010; and Wey and Wang, 2010). Infixed width, the compensation error iscorrected by overall truncation error of MDSPblock. Error compensation algorithm makesuse of partial product terms with the largest

30


weight of LSB with the concept of probability,statistics and linear regression analysis todetect error compensation value (Jou et al.,1999).

In this technique, there is no particular fixed-width RPR block is designed. Fixed widthmultiplier circuits are integrated with full-widthmultiplier circuit to compensate the error value.The Error compensation circuit should belocated in the non-critical path of fixed-widthRPR in order not to increase the critical pathdelay.

Error Compensation VectorThe main function of RPR block is to constructthe errors occurred in the output of MDSP andmaintains the SNR of whole system whileachieving the low supply voltage. The circuitof compensation vector is shown in Figure 2.In MDSP of ANT Baugh-Wooley multiplier ofn-bit inputs of X and Y is given by

X= Ʃ (i=o to n-1) xi.2i Y= Ʃ (i=0 to n-1) yi.2

i

...(2)

The multiplication result is denoted by ‘P’which is

P= Ʃ (k=0 to 2n-1) pk.2k = Ʃ (j=0 to n-1) Ʃ(i=0 to n-1) xi yj. 2

i+j

...(3)

Figure 1: Existed Fixed-Width Multiplier by Using ANT Architecture

Figure 2: 12*12 ANT Multiplier with 6-BitFixed-Width RPR Block

31


The (n/2) bit unsigned full-width Baugh-Wooley product array is divided into 4 sub-sets. They are: Most Significant Part (MSP)Input Correction Vector ICV (), Minor InputCorrection Vector MICV () and LeastSignificant Part (LSP). Here, for computationalpurpose only MSP part is present remainingsub-sets are truncated. ICV () and MICV ()are applied to construct the errorcompensation algorithm because they havehighest Weight.

Accuracy of fixed-width is given by

= P – Pt ...(4)

where P is output of MDSP and Pt is fixed-width RPR output. Average truncated error isdistributed between and + 1. If we analyzethe multiple compensation vectors for averagepurpose which are greater than 0.5*2(3n/2). Sothat we can lower the compensation erroreffectively. Multiple compensation vectors areconstructed by combining f (ICV) and f (MICV)(I-Chyn Wey et al., 2015). The weightedvectors is equal to half of .

Fixed Width RPR Multiplier withCompensation Constructed by ICVand MICVIn order to realize the fixed width RPR, weconstruct one directly injecting ICV () and oneMICV () to detect the insufficient errorcompensation cases. ICV is realized by C1,C2, C3, …, C (n/2)-1 which is shown in Figure 3.Other compensation vector is constructed byone conditional controlled OR gate. Input of ORgate is X(n/2)Y(n-1). Other input is conditionalcontrolled by judge whether = 0 and l

≠ 0 aswell. Cm1 indicates the controlled input of ANDgate. Cm and X (n/2) Y (n-1) are injected into twoinput OR gate. Since it gives the optimal

solution in which out performs the low powerconsumption and area efficient architecture.

OVERVIEW OF VEDICMULTIPLIERMultipliers (Binary Multiplier) are the criticalcomponents in DSP devices. The performanceof DSP processor is based on mainly complexcalculations of multipliers (Arraymultiplications). So the design of a multiplier ina DSP processor has a major impact on theperformance. Vedic multipliers are designedbased on the 16 sutras(principles). It helps inthe field of different branches of engineering forcomputation process. Integrating Multiplicationwith Vedic mathematician techniques helps thesaving of computation time. It enhances thespeed of operation. These methods can bedirectly applied to both differential and integralcalculus of any kind.

PROPOSED 12-BIT POWEREFFICIENT VEDICMULTIPLIERIn the previous method, the RPR block isconstructed to truncate the rounding errors.

Figure 3: High Accuracy Fixed Width RPRMultiplier Combining ICV and MICV

as Compensation Vectors

32


So that the complex circuitry is built usingcombinational logic gates. Here, weimplement a 12*12 bit Vedic multiplier byadopting Urdhva Tiryagbhyam. It is a verticalcross algorithm in which the multiplier andmultiplicand are product each other and addedwith the carry from the previous step. It leavesa result and a carry. This carry is added in thenext step and hence the process goes on. TheRPR block is implemented with 6-bit vectorusing full-adders. This 12-bit multiplier blockis designed using four 6-bit multipliers and two8-bit adders and two 12-bit adders. At the firststep, the carry is taken as zero. The 12-bitmultiplier circuit is shown below.

The first most 6 LSB bits of two operandsare going to product each other [(a5-a0) and(b5-b0)]. Then we get the 12-bit output whichcarries the LSB 6 bits are left and the MSB 6bits are given to 12-bit Carry Look ahead adderalong with (n/2) 0’s are appended to it. In thesimilar way, remaining 3 product terms are going

to continue the process. At the bottom side, allthe 12-bit outputs are added together and givethe result of 24-bit output with leaves a carry. ACLA or fast adder is used in digital logic. Itimproves speed by reducing the amount of timerequired to determine carry bits. It depends ontwo things.

1. Calculating, for each digit position, whetherthat position is going to propagate a carryif one comes in from the right.

2. Combining these calculated values to be ableto deduce quickly whether, for each group ofdigits, that group is going to propagate a carrythat comes in from the right.

The multiplier is integrated with the proposedANT architecture to find out the error precisionvalue using modelsim-Altera Starter edition.Finally, after simulating the verilog code theamount of power is greatly reduced to 24.01W. Hence we can say that the approach of Vedicmultiplier is power efficient.

Figure 4: 12-bit Vedic Multiplier

33


SIMULATION RESULTS

Figure 5: Simulation Result of Existed 12-Bit Baugh-Wooley Multiplier

Figure 6: Simulation Result of Existed Fixed-Width RPR Block

34


Figure 8: Power Consumption of Existed 12-bit Fixed Width RPR

Figure 9: Power Consumption of Proposed 12-bit Fixed Width RPR

Figure 7: Simulation Result of Proposed 12-bit Vedic Multiplier

35


Des ign Name12-bit Fixed Width

RPR Multiplier12-bit Power Efficient

Vedic Multiplier

W ord Length(n)

12 16

Product W ordLength (n)

24 32

Trans is tor/GateCount

1372 1240

Power Supply 1 1

Total On-ChipPower

38.829 W 24.013

Chip Size 4616.5µm2

2545.2 µm2

ProcessTechnology

90 nm CM OS 45 nm CM OS

M ax Frequency 200 M Hz 250 M Hz

Delay (ns ) 9.868 7.940

Table 1: Performance Comparisonof Baugh-Wooley and Vedic Multiplier

CONCLUSIONIn this paper, a 12-bit low-error and area-efficient xed-width RPR-based ANT multiplierdesign was presented in 90 nm CMOStechnology. The proposed 12-bit Vedicmultiplier with 8-bit fixed width RPR circuit isimplemented in 45 nm CMOS technology.Under 0.6 V supply voltage and 200-MHzoperating frequency, the total powerconsumption is 38.36 W. But for the 12-bitVedic multiplier it consumes only 24.013 W.Hence the power is efficiently used andimplemented.

REFERENCES1. Curticapean F and Niittylahti J (2001), “A

Hardware Efficient Direct DigitalFrequency Synthesizer”, in Proc. 8th IEEEInt. Conf. Electron., Circuits, Syst.,Vol. 1, September, pp. 51-54.

2. Hedge R and Shanbhag N R (1999),“Energy-Efficient Signal Processing via

Algorithmic Noise-Tolerance”, in Proc.IEEE Int. Symp. Low Power Electron.Des., August, pp. 30-35.

3. I-Chyn Wey, Chien-Chang Peng andFeng-Yu Liao (2015), “Reliable Low-Power Multiplier Design Using Fixed-Width Replica Redundancy Block”, inIEEE Transactions on Very Large ScaleIntegration (VLSI) Systems, Vol. 23,No. 1, pp. 78-87.

4. Jou J M, Kuang S R and Chen R D (1999),“Design of Low-Error Fixed-WidthMultipliers for DSP Applications”, IEEETrans. Circuits Syst., Vol. 46, No. 6,pp. 836-842.

5. Jou S J and Wang H H (2000), “Fixed-Width Multiplier for DSP Application”, inProc. IEEE Int. Symp. Comput. Des.,September, pp. 318-322.

6. Kidambi S S, El-Guibaly F and AntoniouA (1996), “Area-Efficient Multipliers forDigital Signal Processing Applications”,IEEE Trans. Circuits Syst. II, Exp. Briefs,Vol. 43, No. 2, pp. 90-95.

7. Kuang S R and Wang J P (2006), “Low-Error Conf igurable TruncatedMultipliers for Multiply-AccumulateApplications”, Electron. Lett., Vol. 42,No. 16, pp. 904-905.

8. Lim Y C (1992), “Single-PrecisionMultiplier with Reduced Circuit Complexityfor Signal Processing Applications”, IEEETrans. Comput. , Vol. 41, No. 10,pp. 1333-1336.

9. Petra N, Caro D D, Garofalo V, Napoli Nand Strollo A G M (2010), “TruncatedBinary Multipliers with Variable Correction

36


and Minimum Mean Square Error”, IEEETrans. Circuits Syst., Vol. 57, No. 6,pp. 1312-1325.

10. Premananda B S, Samarth S Pai,Shashank B and Shashanl S Bhat (2013),“Design and Implementation of 8-Bit VedicMultiplier”, International Journal ofAdvanced Research in Electrical,Electronics and InstrumentationEngineering, Vol. 2, No. 12, pp. 5877-5882.

11. Schulte M J and Swartzlander E E (1993),“Truncated Multiplication with CorrectionConstant”, in Proc. Workshop VLSISignal Process., Vol. 6, pp. 388-396.

12. Shim B, Sridhara S and Shanbhag N R(2004), “Reliable Low-Power DigitalSignal Processing via Reduced Precision

Redundancy”, IEEE Trans. Very LargeScale Integr. (VLSI) Syst., Vol. 12, No. 5,pp. 497-510.

13. Strollo A G M, Petra N and Caro D D(2005), “Dual-Tree Error Compensationfor High Performance Fixed-WidthMultipliers”, IEEE Trans. Circuits Syst. II,Exp. Briefs, Vol. 52, No. 8, pp. 501-507.

14. The International Technology Roadmapfor Semiconductors [Online] (2009),available http://public.itrs.net/

15. Wey I C and Wang C C (2010), “Low-Errorand Area-Efficient Fixed Width Multiplierby Using Minor Input Correction Vector”,in Proc. IEEE Int. Conf. Electron. Inf.Eng., Vol. 1, August, pp. 118-122, Kyoto,Japan.

implementation of power efficient 12-bit vedic multiplier …

Documents