2013 tayab paul amin

14
POWER-Area-Performance Characteristics of FPGA-based Sigma-Delta FIR Filters Tayab D. Memon & Paul Beckett & Amin Z. Sadik Received: 11 April 2011 / Revised: 15 December 2011 / Accepted: 3 February 2012 / Published online: 29 February 2012 # Springer Science+Business Media, LLC 2012 Abstract While one-bit ΣΔ modulators are widely used in Analog to Digital conversion stages due to their inherent line- arity and precision, it is less common for the entire digital processing path to operate in single bit mode at the over- sampled rate of the conversion system. The conventional ap- proach has been to decimate the signal bit stream after conversion and for the remaining processing to be performed in standard multi-bit binary at the Nyquist rate and with a resolution mandated by the dynamic range and noise. Using a Finite Impulse Response filter design as an example, we com- pare the area and performance of this conventional approach with the alternative single bit approach that operates directly on the ΣΔ data stream using ternary coefficients {-1, 0, +1} derived from the ΣΔ modulation of the target impulse re- sponse. Filters exhibiting approximately equivalent spectral performance in the two alternative approaches were developed using VHDL and simulated using some commercial FPGA types. In these experiments, the single-bit filters using ternary coefficients were found to dissipate less power compared to the conventional approach despite their need to operate at much higher clock rates. They also exhibit up to 40% higher perfor- mance and offer useful area savings at lower filter orders. At higher orders, the ΣΔ approach retains its power and perfor- mance advantages but exhibits slightly higher chip area. The simplicity and low power of the ΣΔ approach makes it appli- cable to mobile communication processing using low cost FPGA technology. Keywords Single bit filters . Sigma Delta Modulation . Ternary Filter . FPGAs 1 Introduction Although rapid advances in Very Large Scale Integration (VLSI) have made it possible to implement fast and efficient DSP functions in hardware, there is a continuing pressure towards smaller area with high performance at low power consumption in portable devices. As a result, there has been much research into finding optimal hardware implementations that fulfill these competing requirements [14]. For example, the characteristics of Finite Impulse Response (FIR) digital filters, which are widely used in signal processing applica- tions, depend directly on the complexity of the essential multiplication steps that, in turn, increase linearly with the order of the filter. Regardless of the many optimizations that have been proposed, a large number of multiplication stages still translates into large area, delay and power consumption. One-bit ΣΔ modulators are widely used in AD and DA conversion stages due to their inherent linearity and precision. However, it is less common for the entire digital processing path to operate on single bit data. The more usual approach has been to decimate the signal data stream after conversion and for the remaining processing to be performed in standard T. D. Memon Department of Electronic Engineering, Mehran University of Engineering and Technology (MUET), Jamshoro, Sindh, Pakistan P. Beckett : A. Z. Sadik School of Electrical and Computer Engineering, Royal Melbourne Institute of Technology (RMIT), 124 Latrobe Street, Melbourne, Victoria 3000, Australia P. Beckett e-mail: [email protected] A. Z. Sadik e-mail: [email protected] Present Address: T. D. Memon (*) School of Electrical and Computer Engineering, Royal Melbourne Institute of Technology (RMIT), 124 Latrobe Street, Melbourne, Victoria 3000, Australia e-mail: [email protected] J Sign Process Syst (2013) 70:275288 DOI 10.1007/s11265-012-0664-8

Upload: fatima-memon

Post on 22-Apr-2017

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: 2013 Tayab Paul Amin

POWER-Area-Performance Characteristics of FPGA-basedSigma-Delta FIR Filters

Tayab D. Memon & Paul Beckett & Amin Z. Sadik

Received: 11 April 2011 /Revised: 15 December 2011 /Accepted: 3 February 2012 /Published online: 29 February 2012# Springer Science+Business Media, LLC 2012

Abstract While one-bit ΣΔ modulators are widely used inAnalog to Digital conversion stages due to their inherent line-arity and precision, it is less common for the entire digitalprocessing path to operate in single bit mode at the over-sampled rate of the conversion system. The conventional ap-proach has been to decimate the signal bit stream afterconversion and for the remaining processing to be performedin standard multi-bit binary at the Nyquist rate and with aresolution mandated by the dynamic range and noise. Using aFinite Impulse Response filter design as an example, we com-pare the area and performance of this conventional approachwith the alternative single bit approach that operates directly onthe ΣΔ data stream using ternary coefficients {−1, 0, +1}derived from the ΣΔ modulation of the target impulse re-sponse. Filters exhibiting approximately equivalent spectralperformance in the two alternative approaches were developedusing VHDL and simulated using some commercial FPGA

types. In these experiments, the single-bit filters using ternarycoefficients were found to dissipate less power compared to theconventional approach despite their need to operate at muchhigher clock rates. They also exhibit up to 40% higher perfor-mance and offer useful area savings at lower filter orders. Athigher orders, the ΣΔ approach retains its power and perfor-mance advantages but exhibits slightly higher chip area. Thesimplicity and low power of the ΣΔ approach makes it appli-cable to mobile communication processing using low costFPGA technology.

Keywords Single bit filters . Sigma Delta Modulation .

Ternary Filter . FPGAs

1 Introduction

Although rapid advances in Very Large Scale Integration(VLSI) have made it possible to implement fast and efficientDSP functions in hardware, there is a continuing pressuretowards smaller area with high performance at low powerconsumption in portable devices. As a result, there has beenmuch research into finding optimal hardware implementationsthat fulfill these competing requirements [1–4]. For example,the characteristics of Finite Impulse Response (FIR) digitalfilters, which are widely used in signal processing applica-tions, depend directly on the complexity of the essentialmultiplication steps that, in turn, increase linearly with theorder of the filter. Regardless of the many optimizations thathave been proposed, a large number of multiplication stagesstill translates into large area, delay and power consumption.

One-bit ΣΔ modulators are widely used in AD and DAconversion stages due to their inherent linearity and precision.However, it is less common for the entire digital processingpath to operate on single bit data. The more usual approachhas been to decimate the signal data stream after conversionand for the remaining processing to be performed in standard

T. D. MemonDepartment of Electronic Engineering,Mehran University of Engineering and Technology (MUET),Jamshoro, Sindh, Pakistan

P. Beckett :A. Z. SadikSchool of Electrical and Computer Engineering,Royal Melbourne Institute of Technology (RMIT),124 Latrobe Street,Melbourne, Victoria 3000, Australia

P. Beckette-mail: [email protected]

A. Z. Sadike-mail: [email protected]

Present Address:T. D. Memon (*)School of Electrical and Computer Engineering,Royal Melbourne Institute of Technology (RMIT),124 Latrobe Street,Melbourne, Victoria 3000, Australiae-mail: [email protected]

J Sign Process Syst (2013) 70:275–288DOI 10.1007/s11265-012-0664-8

Page 2: 2013 Tayab Paul Amin

binary at the Nyquist rate and with a resolution mandated bydynamic range and noise considerations.

Sigma Delta Modulation (ΣΔM) encoding of the FIRfilter coefficients has shown to be efficient way to reduce thecomplexity of the multiplier and improve its area–perfor-mance tradeoffs [2]. The simple arithmetic of single-bit DSPsystems results in efficient hardware implementations thatmap well to FPGA resources, which comprise flip-flops plussimple logic blocks and/or look-up tables. The advantagesof single-bit systems were first identified by [1] and furtherdeveloped in [3, 4] and [5]. Recently, general purpose ShortWord Length (SWL) DSP applications including classicalLMS algorithms have been described in [6, 7].

While the general structure of a single-bit filter is similarto its multi-bit counterpart, it has to operate at a large OverSampling Ratio (OSR) in order to achieve an equivalentlevel of Signal to Quantization Noise Ratio (SQNR). Theorder of the single-bit filter is directly related to the order ofthe OSR. Increasing the OSR increases the order of the filterand improves the SQNR at the expense of more hardware.At the same time, a high operating frequency is required toachieve a given level of SQNR, so their implementationbecomes more challenging.

Single-bit ΣΔM DSP systems have tended to beapplied to ADC and audio processing applications.However, recent results (e.g., [8]) have shown that theycan be operated at clock speeds in excess of 400 MHzand with a dynamic range beyond 70 dB, making themsuitable for video processing applications as well. Nev-ertheless, it is still not immediately clear whether theuse of ΣΔ modulated coefficients on short word-length(i.e., binary or ternary) data will result in smaller ormore power-efficient filter designs. Towards this end,comparisons between FPGA implementations of ternaryand multi-bit binary FIR filters were presented in [9]. Inthis paper, we extend that work and present area, powerand performance comparisons for a range of single-bitand multi-bit FIR filter designs with equivalent spectralperformance. Both filter types have been synthesized oncommercial FPGA devices using pipelined and non-pipelined organizations. While the design of single-bitDSP applications has been proposed in [6, 7, 10], theauthors are unaware of other work undertaken on thehardware implementation of single-bit FIR filters.

The remainder of this paper proceeds as follows. InSection 2, FIR filter design techniques and their FPGAimplementation are briefly introduced. In Section 3, asingle-bit FIR filter is derived in Matlab® based on anexample Target Impulse Response. The results for single-bit and conventional FIR filters, implemented in VHDL andsynthesized for two FPGA devices are described in Sec-tion 4. Finally, in Section 5, we summarize and conclude thepaper.

2 FIR Filter Design Techniques

As outlined above, it is the performance of the multiply-accumulate (MAC) stages that will have the greatest impacton the overall behavior of digital filters, including FIR types.Thus, various filter design techniques have been proposed thatspecifically target the complexity of these stages. For example,distributed arithmetic is a common technique that has been usedin FPGA designs for many years [11, 12] in which the multi-plication stages are performed using Look-up Tables (LUTs)thereby reducing the overall size of the hardware. In [13],Systolic Distributed Arithmetic was used by to improve thearea-performance-power tradeoffs of a FIR filter design imple-mented on a Xilinx Virtex-E device at various filter orders butwith a fixed coefficient bit-width (i.e. L08). It was observedthat the best tradeoffs between area-performance and power canbe achieved at an address length of four.

Many other techniques have been proposed: CanonicalSign Digit (CSD) [14]; the Dempster Method [15]; MirrorSymmetric Filter Pairs [16]; two-stage parallelism [17] andRedundant Binary Schemes [18] to name just a few. Methodsspecifically aimed at FPGA-based FIR filter implementationsinclude the fully pipelined and full-parallel transposed form[19], Add-and-Shift method with advanced calculation [20]and hardware efficient distributed arithmetic for higher orders[11, 12]. In [17], a new design technique based on a linearphase prototype filter that exploits coefficient symmetry wasshown to offer better performance at a hardware cost similar tothat of linear phase filters. Further, [17] also reported a trans-pose direct-form with CSD multipliers that offers better area-performance tradeoffs when using classical methods.

Apart from the classical multiplier complexity reductiontechniques, a new approach called Slice Reduction Graphs(SRG) [19] that reduces area by minimizing the multiplierblock logic depth and pipeline registers has been shown tooffer improved area-performance over the Reduced AdderGraph (RAG) and Distributed Arithmetic (DA) techniques.In [19], simulations were carried out at coefficient bit-widths in the range of 2–20 bits, while keeping the orderof the filter constant (i.e., at 51). The order of the filter was thenvaried in the range 10–250 at fixed coefficient bit-widths. Themaximum average operating frequency achieved by the pro-posed technique was in the range of 175–180 MHz at thelowest filter order, further reducing towards 150–160 MHz asthe filter order increased above 60.

The primary intent of the techniques mentioned abovehas been to improve the area-performance characteristics ofparallel multi-bit binary filters operating at the Nyquist rate.However it is obvious that the format of the coefficients andinput data is one reason for the high complexity of the MACstages. In [1, 3, 4], the complexity of the filter coefficientshas been addressed by employing a simple single-bit coef-ficient format. This technique can reduce the hardware

276 J Sign Process Syst (2013) 70:275–288

Page 3: 2013 Tayab Paul Amin

complexity of multipliers to simple AND-OR logic or smalllook-up table (LUT) organizations.

Of course, the issue of multiplier complexity can al-ways be addressed by assuming constant coefficient mul-tiplication. Whereas a full multiplier can handle anyarbitrary combinations of two multiplicands, if one ofthe multiplicands is a constant, a far more efficient imple-mentation will involve simple look-up tables plus adder/subtractors modules [21], although the actual complexityof these multiplier structures depends entirely on the valueof the constant. Further, if the coefficient is zero, themultiplication step can be removed completely. Althoughit is difficult to generalize, in our work we have observedthat in the order of 30% of coefficients, across a range ofFIR filter designs, are zero. Obviously, this simplificationis available to single bit filters as well. In this case,multiplication by the symbols {−1, 0, +1} can be replaced

by a simple wiring connection tofDi;Dig, while a coeffi-cient of zero implies that the connection can be removedcompletely, along with any part of the “accumulate” func-tion to which it connects.

Although the work reported here relies on examples inwhich the coefficients are fixed, the context of this work isthe creation of a class of SWL filters that will eventuallyform part of an adaptive system. As such, these filters areunable to take advantage of fixed coefficient simplifications.The descriptions below assume we are comparing fullmultiply-accumulate structures.

3 Single-Bit ternary FIR-like filter

In this section we examine the architecture of single-bitternary FIR-like filter designed using a direct-form structureand ternary coefficients of Fig. 1 [7]. We define a TernaryFIR filter as one in which the coefficients are drawn fromthe set {−1, 0, +1}. This contrasts with the single-bit binarycase where the coefficients exist in {−1, +1}. In return forthe additional signal bit needed to describe its coefficients,the ternary filter will exhibit a higher Signal to Quantization

Noise Ratio (SQNR) compared to the binary case (seeTable 1).

This filter architecture comprises two parts: the ternaryFIR filter stage followed by a re-modulator (i.e., IIR filter).The Ternary FIR filter (Fig. 2) exhibits a conventionaltransversal structure and its output y(k) is in a multi-bitformat. The IIR re-modulator filter follows the ternary FIRfilter to transform its output back to single-bit format at thecost of an increase in chip area and lower performance.Despite the inclusion of the separate IIR re-modulator, com-pared to the work reported in [13, 19], the hardware FPGAimplementation of this filter at an equivalent filter order hassuperior performance at the cost of slightly more hardware.Furthermore, additional coefficient quantization is unneces-sary as coefficients are already in single-bit format.

3.1 Ternary FIR Filter

Although the taps of the ternary filter are constrained to theternary set, it can be seen that its overall architecture isidentical to the direct form of its multi-bit counterpart(Fig. 2). The ternary filter output y(k) is given by theconvolution of the taps hi and the input signal x(k) as:

yðkÞ ¼XMi¼0

hixk�i ð1Þ

where M is the order of the filter (≡ number of taps) andh represents the ternary FIR filter coefficients. The ternarytaps can be generated using a second order sigma delta mod-ulator (ΣΔM) as reported in [7, 18]. The essential require-ments for this ΣΔM structure (Fig. 3) are to achieve a flatpass-band across the desired frequency band and for the outputof the quantizer to be in ternary format. The z-domain transferfunctions of the second order ΣΔM (Fig. 3) is [22]:

HðzÞ ¼ NðzÞz�1 þ EðzÞð1� 2z�1 þ z�2Þ ð2Þwhere N (z) represents the target impulse response andE(z) is the quantization noise transfer function. In ΣΔMthe inherent filtering term,ð1� 2z�1 þ z�2Þ is responsible

x(k) y(k) n(k) r(k)

Z

TernaryFIR Filter

Sigma-DeltaModulator

Multi-bit Format

Single-bit Format

u

-1

Figure 1 General blockdiagram of single-bit ternaryFIR-like filter structure(adapted from [7]).

J Sign Process Syst (2013) 70:275–288 277

Page 4: 2013 Tayab Paul Amin

for the noise shaping effect. The frequency response ofthis ΣΔM is given by:

HΣΔT ðejΩÞ ¼ NðejΩÞe�jΩ þ EðejΩÞð1� 2e�jΩ

þ e�2jΩÞ ð3Þwhere Ω ¼ 2pf =fS is the normalized frequency (radians).In the same way, the in-band noise of a 2nd ordersigma-delta modulator can be defined as [22]:

σ2ey ¼ σ2

e

p4

5

2fBfs

� �5

ð4Þ

where 2fB fs ¼= 1 OSR= ; σ2e ¼ Δ2 12= is the mean square

error and the step size (Δ) has its standard definition.Considering a N-bit ADC with 2N quantization levelsthen Δ ¼ ðxmax � xminÞ 2N= [23]. The signal-to-quantizationnoise ratio (SQNR) of the 2nd order sigma-delta modulatorcan be defined as:

SQNR ¼ 10 logðσ2xÞ � 10 logðσ2

eÞ � 10 logp4

5

� �

þ 15:05rðdBÞ ð5Þ

whereσ2x andσ

2e are signal and quantization error power or

variance and OSR is represented by fs 2fB= ¼ 2r. This functionillustrates the direct relationship between the SQNR and

resolution of ADC. Every single-bit increase in a 2nd orderΣΔM ADC will increase SQNR by 6-dB. Similarly, an in-crease of half a bit in ADC will add 3-dB of SQNR. On theother hand, every doubling of OSR will add approximately15-dB SQNR [23].

3.2 Generation of Ternary FIR filter in Matlab

The generation of a ternary FIR filter (e.g., in Matlab) com-mences with the selection of the Target Impulse Response(TIR). In the following sections we have used a lowpass filterexample with the following specifications: Sampling Frequen-cy 8000 Hz, Passband 0–800 Hz, Stopband 1200–4000 Hz,Passband Ripple (δp) 1.5 dB and Stopband Attenuation (δs) of90 dB. The Target Impulse Response was generated using theRemez exchange algorithm. The optimum order of the filterfor these specifications was found to be 63. The desired TIRwith passband and stopband attenuation values of 0.2π and0.3π respectively and with 90-dB of stopband attenuation isshown in Fig. 4.

To satisfy the input oversampling requirement of thesigma-delta modulator, the coefficients must be scaled be-fore encoding into the ternary format so that its peak inputoperates within the maximum signal-to-quantization-noiseratio (SQNR), fully utilizing the available dynamic range.FFT is one of the efficient scaling techniques reported in[24] and has been used here. The taps are encoded into aternary format after scaling (i.e., oversampling). It is worthnoting here that using a ternary format for the coefficientsresults in better SQNR compared to binary [25].

The Ternary filter (i.e., with ternary coefficients) exhibitsthe same impulse response as the TIR (specifically, in thepassband) but with a number of taps proportional to theOSR [8]. The ternary coefficients, r(k), that represent thequantizer level outputs of the second order sigma-deltamodulator and were derived using:

rðkÞ ¼þ1 wðkÞ > b

4

0 � b4 < wðkÞ < b

4

�1 wðkÞ < � b4

8><>: ð6Þ

Table 1 Signal-to-noise ratio comparison of single-bit and multi-bitFIR filter.

N Single-bit filter Multi-bit filter

OSR SQNR CoefficientBit Width

5dBpb 6dBpb

Binary Ternary

64 8 34 43 8 40 48

32 64 73 12 60 72

64 79 88 16 80 96

128 94 103 18 90 108

Figure 2 Block diagram ofternary FIR filter (adapted from[7]).

278 J Sign Process Syst (2013) 70:275–288

Page 5: 2013 Tayab Paul Amin

where �b 2; b 2==½ �is the dynamic range of the sigma deltamodulator and w(k) is the input to the ternary quantizer. Theternary coefficients and the binary input stream generated atthis stage according to the filter specification given abovewere used to simulate the FIR filter implemented in VHDLdescribed in Section 4.

3.3 IIR Re-modulator

The output of the ternary FIR filter is in multi-bit format andincludes a high frequency noise component. To overcomethis a sigma-delta modulation based IIR filter first reportedin [26] was used as an IIR re-modulator in [7] (see Fig. 1).The sigma-delta modulator is treated as delay element asdescribed in [26]. The major advantage of this structure is itssimplicity and robustness compared, for example, to thecircuits proposed for similar applications in [4, 27]. Its gainis controlled by two parameters u and v and due to sigma-delta modulator in feedback loop [7, 28], the stability of theoverall filter system depends primarily on this IIR stage. Thetransfer function of the re-modulator (Fig. 1) is given by:

HIIRðzÞ ¼ HIIRSðzÞ þ HIIRN ðzÞ ð7Þ

where S and N represents the signal and noise. HIIRS isgiven by:

HIIRSðzÞ ¼ v:z�1

1� ð1� uÞz�1ð8Þ

and HIIRN by:

HIIRN ðzÞ ¼ ð1� z�1Þ31� ð1� uÞz�1

ð9Þ

3.4 Transfer Function of Single-Bit Ternary FIR Filter

The overall transfer function of the single-bit ternary FIRfilter (shown in Fig. 1) is given by the multiplication of theternary and IIR re-modulator parts and can be described as:

HFILðejΩÞ ¼ HΣΔT ðe�jΩÞ:HIIRðe�jΩÞ ð10ÞFrom (2) and (7) we obtain:

HFILðejΩÞ ¼ HΣΔT ðe�jΩÞ:ðHIIRSðe�jΩÞþ HIIRN ðe�jΩÞÞ ð11Þ

Z-1

-

Z-1+

-

n(k) r(k)

r(k)

+ + + +

r(k)

w(k)Figure 3 Second Order ΣΔMarchitecture.

Figure 4 Target ImpulseResponse by Remez ExchangeAlgorithm.

J Sign Process Syst (2013) 70:275–288 279

Page 6: 2013 Tayab Paul Amin

which can be further expressed as:

HFILðejΩÞ ¼ NðejΩÞ½e�jΩ þ e�2jΩðv� 1Þ1� ð1� uÞe�jΩ

þ EðejΩÞ1� ð1� uÞe�jΩ

1þ e�jΩðv� 3Þþe�2jΩð3� 2vÞþe�3jΩðv� 1Þ

24

35 ð12Þ

As is evident from the general form of their transfer func-tion, the overall simplicity of short word length techniques canresult in very simple hardware implementation, especially onfine-grained devices such as FPGAs. An implementation ofthis ternary filter is discussed in next section.

4 Single-bit ternary FIR-like Filter Design in VHDL

As already described above, the basic structure of a single-bitternary FIR-like filter (Fig. 5) comprises two components: theternary FIR filter and the IIR re-modulator. The ternary sectionis a typical FIR-like filter that performs multiplication of thecoefficient taps with the binary input followed by the additionof the partial products. The IIR-remodulator section then con-verts the FIR output back to single-bit and removes highfrequency and quantization noise components. This sectiondiscusses the overall architecture of the single-bit FIR filterimplemented in VHDL shown in Fig. 1.

4.1 Single-Bit FIR Filter Hardware Implementation

As the ΣΔM typically operates at a high OversamplingRatio (OSR), a large number of taps may be required for asingle bit filter. For example, compared with a multi-bit (i.e.,

conventional binary) FIR filter with 64 taps, an approximatelyequivalent single bit filter will require 2048 ternary taps at anOSR of 32. Our implementation divides this intoN coefficientmultiply blocks followed by an adder tree with log2N levels toperform the summation. Thus, choosing the number of taps tobe a power of two (i.e., 2N) tends to simplify the implemen-tation of the addition stage. As described earlier, many algo-rithms have been developed to reduce latency as well asimprove the performance of multi-bit FIR filters [12, 13,19]. To achieve improved performance with a smaller numberof LUTs, we have focused on techniques that map efficientlyonto FPGA organizations but are also suitable for ASICimplementation.

To deal with signed-bit arithmetic in FPGAs, 2’s com-plement format is a reasonable choice. The coefficient sym-bols {+1, 0, -1} can be easily mapped to two-bit numbers in2’s complement as: +1→01, 0→00, -1→11. Note that,while 2’s complement can simplify the arithmetic, any otherdual-rail (2-bit) format is equally applicable here. Using abinary tree structure to sum the partial products over N0

2048 implies eleven stages and a final multi-bit result of ±N,thus requiring a total of 13 bits to completely express the fulloutput range of ±2048. As mentioned above, single-bitfilters reduce complex multiplication structures such asthose employed in [13, 19, 20] to simple AND-OR logicfunctions that can typically be mapped to a single LUT.

4.2 Ternary Multiplier and Adder Modules

A small fragment of the adder tree is shown in Fig. 6. Theoverall number of addition blocks halves at each successiveadder stage while their length increases by one-bit, culmi-nating in the final multibit output. If we consider 2048 data/

Clocked Shift Registers

Y

Data In

R

R

R

01

11

01

R

R

R

01

11

01

Mult Blocks

Log 2

N A

dder

Sta

ges

A

B

N

Coeff In

First Order IIR Filter

S

Figure 5 Block diagram ofsingle-bit ternary FIR-like FilterArchitecture.

280 J Sign Process Syst (2013) 70:275–288

Page 7: 2013 Tayab Paul Amin

coefficient pairs, so that the adder tree comprises 11 levels,the first few adder stages will have inputs in the range of 2 to8 bits, which can easily be mapped to a single LUT in atypical FPGA architecture while the remainder will com-prise small ripple-carry blocks from three to 12 bits long.Note that it would be equally possible to use optimized IPblocks created specifically for this purpose. In this paper, wehave taken a more general approach, so that our implemen-tation results might be considered to be worse-case.

As highlighted above, the main advantage of the ternarydesign is its simple hardware implementation, especially withrespect to the multiplier blocks that are typically the most

complex modules in multi-bit filters. The Boolean logic of aternary multiplier outputs (m1, m0) using ternary format forcoefficients (c1c0) and inputs (d1d0) is given simply as:

m1 ¼ c0:c1:d0 þ d0:d1:c1;m0 ¼ c0:d0 ð13Þwhere c0 and c1 are ternary coefficients and d0, d1 are binarydata bits. This simple implementation of the single-bit multi-plier and short adder modules results in a robust and efficientdesign that exhibits significant advantages over its complexmulti-bit counterpart.

4.3 Multi-bit FIR Filter Design

As there is no essential architectural difference betweensingle-bit and multi-bit filter organizations, (except an addi-tional IIR re-modulator part following the ternary FIR filterin single-bit filter), similar design methods can be used forboth. In the experiments reported here, the multi-bit filtercoefficients were converted into fixed point using the fixedpoint toolbox available in Matlab [29]. To maintain the fixedpoint format, double precision coefficients generated byMatlab were initially converted into single-precision FPformat. Coefficients were then further quantized with tightconstraints into the required number of fixed point mantissabits (i.e., 12, 16, 18 bits) [30]. As expected, some of theprecision was lost after quantization so the filter responseshown in Fig. 7 has greater ripple in the stop-band comparedwith the target impulse response (i.e., Fig. 4). It can beobserved that, as expected, the stop-band ripple diminishes

Figure 6 Two level fragment of the adder tree structure.

Figure 7 Frequency response of the target filter at various coefficients bit-widths (012, 16 and 18).

J Sign Process Syst (2013) 70:275–288 281

Page 8: 2013 Tayab Paul Amin

as number of mantissa bits increases. The filter responsecomes within about 1 dB of the desired shape at a mantissabit-width of 18.

4.4 Spectral Performance Comparison

Table 1 shows a comparison between single-bit and multi-bit filters on the basis of their theoretical spectral perfor-mance at a fixed filter order of 64 and with varying bit-widths. The binary (B) and ternary (T) SQNR has beenincluded in this table simply to illustrate how it impactsupon the filter performance, using (4). However, using theternary format results significant improvement in SQNRwith no impact on hardware area due to the 2’s complementformat mapping discussed in Section 4. In same way, themulti-bit filter order at both theoretical (6-dB) and practical(5-dB) SNR values are shown. The theoretical and practicalvalues of multi-bit SNR are presented so the ternary SQNRcan be correlated with its corresponding SNR.

It should be noted, however, that there are differences inthe spectral performances of the filters shown in Table 1 thatmake them difficult to compare directly on a one-on-onebasis. In these experiments, although we attempted to matchthe spectral performance of the two filter types as closely aspossible, it is difficult to achieve both an OSR that is anexact power of 2 and at the same time match the relativespectral performance of the two types. In setting up thecorresponding cases in Table 1, we attempted to reduce thedifference between the spectral performances of the twofilters without concern for their absolute performance relation-ship. For example, at an OSR of 8, the single-bit filter exhibitsan SQNR of 43 dBwith ternary (T) and 34 dBwith binary (B)coefficients. This level of SQNR improvement obtained usingΣΔM with ternary coefficients is consistent with the averagegains of 9.0 dB and 7.0 dB reported in [3, 25].

Coefficient bit width (CBW) is an important design issuefor multi-bit filters. When synthesizing the filter hardwarewe chose the input bit-width to be the same as the coeffi-cient width at each stage. In this way, a multi-bit filter withN064 requires sixty four multiply blocks and six adderstages to achieve its final multi-bit output. Note that analternative scheme could use just one multiplier operatedsequentially (or, indeed, any number between 1 and 64) withcorrespondingly lower overall area and performance. In thispaper, we are comparing two equivalent architectures thatoffer peak performance for a given technology. Thecorresponding values for the nearest multi-bit filter with anapproximately equivalent SNR (i.e., CBW08) are 40 dB(5 dB/bit) and 48 dB (6 dB/bit). Doubling the OSR (from8 to 16) would raise the SQNR of the single-bit filter to58 dB with ternary format, moving it too far away from thecorresponding 48 dB value of the multi-bit filter to becomparable.

5 Simulation Results and Discussion

Both filters were coded in VHDL and compiled, simulatedand synthesized in Quartus-II 9.1, on the Cyclone-IIIEP3C120F484C7 and Stratix-III EP3SL340F151C72 devi-ces using the same vector waveform file input. The filtercoefficients and input bit-stream for single-bit and multi-bitfilters, previously generated using Matlab were held inblock RAM. A simulation commenced by transferring thecoefficients from memory to the input registers. Followingthis initialization stage, the data stream was serially shiftedto the data registers. On each clock cycle, data and coef-ficients shift one bit, triggering new multiply-accumulateoperations. The final (multi-bit) output stream was storedin another local memory bank. In the case of the single-bitFIR filter, the filtered multi-bit output was further passed tothe IIR filter for single-bit conversion.

5.1 Filter Area-Performance Analysis

Various area-performance tradeoffs for single-bit and multi-bit are presented for both non-pipelined (Table 2) and pipe-lined modes (Table 3). The appropriate number of ternarycoefficients (TC) was obtained by multiplying the actualnumber of coefficients by the OSR. For example, at a fixedorder of 64 and with an OSR of 32, the number of ternarycoefficients (i.e., the order of the single-bit filter) would be64×3202048. On the other hand, the multi-bit simulationsused an adder tree with an identical structure as the single-bit filters but using built-in multiplier macro blocks.

As the primary building block (“logic element”) in thesedevices comprises a small partitioned LUT plus one or moreflip-flops (FFs), where one part of the logic element is used,the other is still available to be used in another part of thecircuit. The overall area metric is therefore determined bythe greater of the LUT and FF counts. The approximatenumber of LUT elements was determined from the flowsummary and is reported as percentage in the precedingsections that shows utilization out of 100%, while FMAX

has its usual definition as the maximum clock rate achiev-able with zero slack on the worse-case critical path(s).During the place and route steps, the maximum operatingfrequency (FMAX) was constrained to a value somewhathigher than achievable for the given technology. The objec-tive was to force the tools to generate a final routing that wascomparable across devices. Although the implementationsare directly comparable, the results may slightly vary in realapplications as no account was taken of pin capacitance orspecific design optimizations such as forcing the use of I/Oregisters.

It can be seen from Tables 2 and 3 that the maximumperformance of the single-bit filter is consistently superior toits corresponding multi-bit filter. For example, in non-

282 J Sign Process Syst (2013) 70:275–288

Page 9: 2013 Tayab Paul Amin

pipelined mode the single-bit filter exhibits 40–50% highermaximum clock frequency (FMAX) over the range of filterorders explored here. Even up to medium order filters (i.e.,below TC04096), the area cost of both approaches is thesame to within 3–4%. It is not until the highest ordersimulated here (TC≥8192) that the area of the single bitfilter significantly exceeds that of its multi-bit counterpart. Itis also worth nothing that, even the highest filter ordersshown here still fit comfortably into the lowest cost FPGAdevice used in this work (i.e., the Cyclone-III).

To achieve a valid comparison between the area and per-formance simulation results given in [13] (that were carriedout with fixed 8-bit coefficients), the single-bit ternary filtersimulation was designed to achieve an equivalent spectralperformance with 64 fixed coefficients. The equivalent area-performance simulation results of ternary filter obtained (atTC0512, i.e. OSR08) has almost double the performance atthe cost of bit higher area, as reported in [13].

In pipelined mode, additional registers were placed be-tween the adder stages, increasing the throughput of thefilter at the cost of a moderate increase in the number ofregisters and a small increase in latency. Single-bit filterscan benefit greatly from pipelining as the simplicity of theirmultiplier operation makes it relatively easy to optimallybalance the pipeline stage delay. In conventional multi-bit

multiplier structures [31] this typically requires a deepknowledge of the structure and timing of the multiplierand possibly the use of a synthesis tool with re-timingcapability. It may be impossible when using pre-compiledIP blocks.

It can be seen in Table 3 that the performance of the multi-bit filter decreases linearly from 199 MHz as the coefficientwidth increases (increasing the SNR). It declines by around31% to 139 MHz at 18-bits. In contrast, FMAX is almostunchanged in the single-bit case under the same conditions(Table 3). The maximum operating frequency of 239 MHz at2048 coefficients reduces by only about 4% as the number ofternary coefficients is successively doubled to 8192.

The single-bit filter organization achieved a maximum of40% improvement in maximum operating frequency(FMAX) over its corresponding multi-bit filter at the highestorder implemented in this work (i.e., 8192). However, it isalso clear from Tables 2 and 3 that the difference betweenthe two approaches diminishes at lower OSR values, partic-ularly in pipelined mode. Predictably, the IIR demodulatorcircuit forming the final stage of the ternary FIR-like filterimpacts its overall performance and becomes the limitingfactor at small OSR values. Thus, the pipelined filter per-formance at an OSR of 8 remains almost same as at 32 or64. On the other hand, at an OSR of 8 the filter performance

Table 2 Area-performancecomparison of single-bit FIR vs.multi-bit filter: non- pipelinedMode.

Device Single-bit Multi-bit

Tern. Coeff LUTs FMAX N W LUTs FMAX

Cyclone-III 512 4089 (3%) 71.4 64 8 8860 (7%) 46.2

2048 15603 (13%) 52.6 12 17045 (14%) 35.3

4096 30894 (26%) 45.3 16 26838 (22%) 29.1

8192 62747 (53%) 40.3 18 32547 (27%) 26.5

Stratix-III 512 3925 (1%) 129.8 64 8 5219 (2%) 86.5

2048 14368 (5%) 97.3 12 10942 (5%) 69.1

4096 28499 (11%) 82.8 16 17731 (7%) 57.5

8192 55927 (21%) 69.6 18 21568 (8%) 51.2

Table 3 Area-performancecomparison of single-bit FIR vs.multi-bit filter: pipelined mode.

Device Single-bit Multi-bit

Tern. Coeff LUTs FMAX (MHz) N W LUTs FMAX (MHz)

Cyclone-III 512 3963 (3%) 125.6 64 8 9020 (8%) 94.5

2048 15399 (13%) 122 12 17079 (14%) 67.1

4096 30607 (26%) 120 16 26890 (23%) 53.3

8192 61029 (51%) 118 18 32586 (27%) 47.4

Stratix-III 512 3719 (1%) 240 64 8 4923 (1%) 258.3

2048 14453 (5%) 237 12 10353 (4%) 199.0

4096 28745 (11%) 237 16 16916 (7%) 158.8

8192 57362 (21%) 231 18 20662 (8%) 139.7

J Sign Process Syst (2013) 70:275–288 283

Page 10: 2013 Tayab Paul Amin

continues to improve in non-pipelined mode (Table 2), be-coming about 27% better at an OSR of 32 and 38% at anOSR of 64 respectively.

As the multiplier is the primary critical path in the multi-bit FIR filter structure, it is virtually impossible to optimallybalance its pipeline. As mentioned above, this might beachieved by modifying the internal stages of the multiplierbut this was not possible in our work as we were alreadyusing highly optimized IP blocks in Quartus II. A disadvan-tage of these macro-blocks is their inflexibility. It was notpossible to add internal pipelining to optimally balance theprocessing stages.

5.2 Filter Power Analysis

The Power Dissipation analysis of both the filters was per-formed in Quartus-II 9.1 using the “Power Play” Power Anal-ysis Tool [32] after the generation of a signal activity file (.saf).The total power of a FPGA device is made up of I/O power,core static power and core dynamic power [32]. We do notreport core static power here as it was observed to be more orless constant across all designs. The main impact on staticpower at the design level comes from the assignment ofunused configurable logic blocks (CLBs) and routing inputs.

In general terms, dynamic power depends upon manyfactors including switching activity, design style, numberof logic blocks and interconnects and input–output databandwidth [13, 32]. It varies with frequency according to:

P ¼ a:F:C:V 2 ð14Þwhere a is the activity factor (broadly, the probability that aparticular node will perform a transition at a given time), Cis total load capacitance, F is the transition frequency (usu-ally assumed to be equal to or directly proportional to theclock frequency) and V is supply voltage. Note that there is

also a contribution from short-circuit current during switch-ing, but it tends to be small when the input and output risetimes are roughly equivalent [33] and is not reported sepa-rately in Power Play.

The area-performance results obtained in Tables 2 and 3identify the maximum operating frequency of thecorresponding filter types achievable at specific filter ordersand data lengths, using currently available FPGA technolo-gy. Equation 14 implies that at a given technology (i.e.,fixed CV2), the dynamic power will depend directly on bothclock and node activity (aF). Thus, clock frequency is animportant parameter when comparing these filters stylesand, in general terms, two choices are possible:

1. to run each filter at its individual FMAX. The resultingspectral performance will be quite different for the fil-ters, making direct comparison difficult;

2. to operate the filters at a pair of related clock frequen-cies that results in equivalent spectral performance. Inthis case, the single-bit filter operating frequency wouldbe OSR*FS, where FS is the frequency of thecorresponding multi-bit filter.

The dynamic power simulations outlined below wereconducted in two stages. Firstly, both the filters were simu-lated at their maximum clock frequency determined by theworse case FMAX (from Tables 2 and 3) for either the single-bit or the multi-bit filter, related via the performance of thefilter (labeled FMAX in the results tables). In a second step,the two filter types were set up to achieve the specificationsoutlined in Section 3.2, (i.e., at FS08000 Hz). As identifiedin Table 1, a single-bit filter can achieve an equivalentspectral performance to the multi-bit case by increasing itsOSR, so in this case the single-bit filter clock was obtainedby multiplying the OSR to the Nyquist frequency (i.e.,FS*OSR). The multi-bit filter clock was kept at its Nyquistrate (8000 Hz) throughout (labeled F8K in the results tables).

Table 4 Clock frequency forternary and multi-bit filterspipelined and non-pipelinedmodes.

TClk Ternary Clock; MClkMulti-bit Clock

F8K Process FMAX Process

Device TClk(KHz)

MClk(KHz)

Non-Pipelined Pipelined

TClk(MHz)

MClk(MHz)

TClk(MHz)

MClk(MHz)

Cyclone-III 64 8 71.4 46.2 125.5 94.7

256 8 52.6 35.3 122 67.1

512 8 45.3 29.1 120 53.3

1024 8 40.3 26.5 118 47.4

Stratix – III 64 8 129.8 86.7 240 258.3

256 8 97.3 69.1 239 199.0

512 8 82.8 57.5 237 158.8

1024 8 69.6 51.2 231 139.7

284 J Sign Process Syst (2013) 70:275–288

Page 11: 2013 Tayab Paul Amin

Table 5 Dynamic power dissipation: FMAX process.

Device TClk(MHz)

MClk(MHz)

#TC #W Ternary (mW) Multi-bit (mW)

CC CCB Reg I/O Total CC CCB Reg I/O Total

Dynamic power dissipation: non-pipelined mode

Cyclone-III 71.4 46.2 512 8 8.3 20.6 17.3 14.9 61 292.1 9.2 12 21.3 335

52.6 35.3 2048 12 7.1 28.5 45.1 12.9 94 366.6 9.8 13.2 24.3 414

45.3 29.1 4096 16 6.6 38.8 80.1 12.4 138 603.7 11.9 15.5 26.9 658

40.3 26.5 8192 18 6.2 47.0 140.4 12.0 206 772.2 12.2 17.7 28.5 831

Stratix-III 129.8 86.7 512 8 56 45.5 28.9 25.5 156 629.7 26.4 37.4 63.2 757

97.3 69.1 2048 12 43.89 65.4 72.3 21.5 203 831.2 27.5 35.7 52.7 947

82.8 57.5 4096 16 33.7 80.6 114.8 18.7 248 1704 37.7 58.2 65.6 1865

69.6 51.2 8192 18 26.9 114.1 189.2 16.5 347 1755.8 33.8 57.5 60.0 1907

Dynamic power dissipation: pipelined mode

Cyclone-III 125.5 94.7 512 8 5.2 30.8 42.7 22.0 101 171.9 16 67.8 37.8 294

122 67.1 2048 12 6 50.5 150.6 22.2 229 342.9 55.6 79.8 40.8 519

120 53.3 4096 16 6.4 85.0 297.1 22.4 411 373.5 23.7 84.8 40.2 522

118 47.4 8192 18 6.4 124.0 591.9 22.9 745 518.8 24.7 91.7 42.0 677

Stratix-III 240 258.3 512 8 28.2 76.0 112.7 46.6 264 284.3 45.5 186.8 103.3 610

239 199.0 2048 12 29.2 156.6 374.4 46.2 606 478 70.4 248.9 121.4 919

237 158.8 4096 16 30.2 234.9 720.8 46.9 1033 906.5 69.3 297.2 123.6 1397

231 139.7 8192 18 29.50 430 1409.9 47.8 1917 1075.6 67.9 308.2 122.8 1575

#TC Number of Ternary Coefficients; #W Multi-bit filter bit width; CC Combinational Circuits; CCB Clock Control Block; Reg Registers

Total: Sum of dynamic power dissipation of CC, CCB, Reg and I/O

Table 6 Dynamic power dissipation: F8K process

Device TClk (KHz) MClk (KHz) #TC #W Ternary (mW)1 Multi-bit (mW)

CC CCB Reg I/O Total CC CCB Reg I/O Total

Dynamic power dissipation: non-pipelined mode

Cyclone III 64 8 512 8 0 0 0 3.0 3.0 0.05 0 0 5.91 6.0

256 8 2048 12 0 0 0 3.4 3.4 0.09 0 0 7.41 7.5

512 8 4096 16 0 0 0 3.6 3.6 0.17 0 0 8.91 9.1

1024 8 8192 18 0.01 0.57 0.92 3.75 5.25 0.24 0 0 9.66 9.9

Stratix III 64 8 512 8 0 0 0 0.85 0.85 0.04 0 0 1.77 1.8

256 8 2048 12 0 0 0 1.0 1.0 0.08 0 0 2.26 2.3

512 8 4096 16 0 0 0 1.0 1.0 0.16 0 0.01 2.73 2.9

1024 8 8192 18 0 0.97 0.94 1.1 3.0 0.21 0 0.01 2.97 3.2

Dynamic power dissipation: pipelined mode

Cyclone III 64 8 512 8 0 0 0 3.00 3.0 0.02 0 0.01 5.91 6.0

256 8 2048 12 0 0 0 3.37 3.4 0.03 0 0.01 7.41 7.5

512 8 4096 16 0 0 0 3.78 3.8 0.06 0 0.01 8.91 9.0

1024 8 8192 18 0.01 0.52 1.64 4.19 6.4 0.09 0 0.02 9.66 9.8

Stratix III 64 8 512 8 0 0 0 0.84 0.8 0.01 0 0.01 1.77 1.8

256 8 2048 12 0 0 0 0.98 1.0 0.02 0 0.01 2.26 2.3

512 8 4096 16 0 0 0 1.04 1.0 0.06 0 0.02 2.73 2.8

1024 8 8192 18 0 0.84 1.77 1.95 4.6 0.06 0 0.02 2.97 3.1

#TC Number of Ternary Coefficients; #W Multi-bit filter bit width; CC Combinational Circuits; CCB Clock Control Block; Reg Registers

Total: Sum of dynamic power dissipation of CC, CCB, Reg and I/O, 1 the low (8 K) clock rate in this case results in power dissipation levels in thenW range, recorded above as zero. The power components might not exactly sum to the total power due to rounding

J Sign Process Syst (2013) 70:275–288 285

Page 12: 2013 Tayab Paul Amin

Table 4 summarizes the various clock frequencies usedfor the simulated filter implementations in Cyclone III andStratix III devices. In the F8K case, both the filters clockremain the same in pipelined as well as non-pipelinedmodes because of clock is dependent upon FS. But in theFMAX processes, the clock frequency varies according to thevalues obtained for the two modes given in Tables 2 and 3.

The simulated dynamic power results are presented inTables 5 and 6. As expected, the high operating frequenciesof the single-bit filters results in the majority of the powerdissipated arising from the operation of the registers andtheir associated clock tree, starting at around 50% for thelow clock-rate Cyclone-based filters and rising to greaterthan 96% for the high performance Stratix implementations.On the other hand, multi-bit filters dissipate most power intheir combinational circuits with little (<10%) resultingfrom the register and clock operation. In both filters, I/Opower is relatively constant or grows linearly as the order ofthe filter increases.

It can be seen that despite their much higher clock rates,the single-bit filters typically dissipate a small fraction ofthe power of their corresponding multi-bit filters, e.g.around 20% for non-pipelined mode in Table 5. Howev-er, in pipelined mode, the effect of the larger number ofregisters (plus their control blocks) means that the single-bit filter power may exceed that of the multi-bit case (byaround 10 to 20%) at the maximum clock rates for thesefilters.

On the other hand, the low clock frequencies for the F8Kprocess results in very small absolute power dissipation byboth filters types with the I/O power dominating in all cases.Even so, the multi-bit filters can be seen (Table 6) to con-sume between 1.5 and 3 times the power of their equivalentsingle-bit filters. Of the range of filter configurations studiedin this work, only in the case of largest filter (8192 coef-ficients), using the most aggressive technology (Stratix III)in fully pipelined mode (i.e., with the greatest number ofregisters) did the single-bit filter power exceed that of itscorresponding multi-bit case—by about 30% in that specificcase.

It is also worth noting that these single-bit filters arecapable of significantly higher maximum bandwidths thantheir multi-bit counterparts as their simpler structure resultsin much higher equivalent processing rates (i.e., FMAX/OSR). Further, even when operating at high clock rates,their power dissipation is lower than the correspondingmulti-bit filter. This offers an additional level of flexibility:it is possible to trade off power, area and performance over a

wider range of filter spectral characteristics than is the casein the multi-bit domain.

6 Conclusion

In this paper, we have examined the area-performance-power tradeoffs implicit in VHDL implementations ofsingle-bit and multi-bit FIR filters. In general terms, wehave found that using single-bit techniques in a FPGAenvironment results in superior performance at a cost ofslightly more area (at higher filter orders). This is largelydue to the internal organization of typical FPGA architec-tures. We found that a clock frequency in the range of250 MHz (shown in Table 3) is easily achievable in a highperformance FPGA device. This would readily handle a4 MHz video stream at OSR of 64 in pipelined mode usingabout 5% of the available area of a mid-range commercialFPGA device.

The dynamic power dissipation figures of both thefilter types were compared with equal clock rates as wellas at the highest clock rates for which equivalent perfor-mance could be established. In all cases, the maximumperformance of the multi-bit filters was the limiting con-straint. It was found that at almost all clock rates, single-bit filters dissipate significantly less power than theirequivalent multi-bit filters. The largest filter studied inthis work represents the only case where the single-bitfilter power exceeds that of its corresponding multi-bitcase, and then only using the highest performance tech-nology in fully pipelined mode.

Although in this case we have analyzed the FIR using anFPGA implementation. The ΣΔM-based balanced single-bit FIR filter can achieve high performance without requir-ing the use of built-in DSP components such as fast parallelmultipliers.

References

1. Benvenuto, N., Franks, L. E., & Hill, F. S. (1985). Realization offinite impulse response filters using coefficients +1, 0, and −1.IEEE Transactions on Communications, COMM-33(10).

2. Memon, T. D., Beckett, P., & Hussain, Z. M. Design and Imple-mentation of Ternary FIR filter using Sigma Delta Modulation. InProc. ISCCC’09, 9–11 October Singapore 2009 (pp. 169–173).

3. Wong, P. W. (1992). Fully sigma-delta modulation encoded FIRfilters. IEEE Transactions on Signal Processing, 40(6), 1605–1610.

286 J Sign Process Syst (2013) 70:275–288

Page 13: 2013 Tayab Paul Amin

4. Wong, P. W., & Gray, R. M. (1990). FIR filters with sigma-deltamodulation encoding. IEEE Transaction on Acoustics, Speech, andSignal Processing, 38, 979–990.

5. Chen, C., & Wilson, A. N. (1998). Higher order sigma-deltamodulation encoding for the design of multiplierless FIRfilters. IEE Electronics Letters, 34(24), 2298–2300.

6. Sadik, A. Z., Hussain, Z. M., & O’Shea, P. (2006). An adaptivealgorithm for ternary filtering. IEE Electronics Letters, 42(7), 420–420.

7. Thompson, A. C., O’Shea, P., Hussain, Z.M., & Steele, B. R. (2004).Efficient single-bit ternary digital filtering using sigma-delta modu-lator. IEEE Letters on Signal Processing, 11(2), 164–166.

8. Memon, T. D., Beckett, P., & Sadik, A. Z. Performance-area trade-offs in the design of short word length FIR filter. In Proc.ICMENS’09, December 28–30 2009 (pp. 67–71)

9. Memon, T. D., Beckett, P., & Sadik, A. Z. Single-bit and Conven-tional FIR Filter Comparison in State-of-Art FPGA. In Proc.ICMENS’09, 28–30 December 2009 (pp. 72–76).

10. Sadik, A. Z., Hussain, Z. M., & O’Shea, P. A single-bit digital DC-blocker using ternary filtering. In Proc. Tencon’05, 2005

11. Grover, R. S., Shang, W., & Li, Q. (2002). A Faster DistributedArithmetic Architecture for FPGAs. Paper presented at theFPGA’02, Monterey, California, USA, February 24–26,

12. Yoo, H., & Anderson, D. V. Hardware-efficient distributedarithmetic architecture for high-order digital filters In IEEEInternational Conference on Acoustics, Speech, Signal Pro-cessing (ICASSP), March 2005 (Vol. 5, pp. 125–128).

13. Meher, P. K., Chandrasekaran, S., & Amira, A. (2008). FPGArealization of FIR filter by efficient and flexible systolization usingdistributed arithmetic. IEEE Transaction on Signal Processing, 56(7), 3009–3017.

14. Jang, Y., & Yang, S. (2002). Low-power CSD linear phase FIRfilter structure using vertical common sub-expression. Electronicletters, 38(15), 777–779.

15. Dempster, A. G., & Macleod, M. D. (1995). Use of minimumadder multiplier blocks in FIR digital filters. IEEE Transactionon cicuits Systems II, Analog Digital Signal Processing, 42(9),569–577.

16. Shanthala, S., & Kulkarni, S. Y. (2009). High speed and low powerFPGA implementation of FIR filter for DSP applications. Europe-an Journal of Scientific Research, 31(No.1), 19–28.

17. Hawley, R. A., Wong, B. C., Lin, T.-J., Laskowski, J., & Samueli,H. (1996). Design techniques for silicon compiler implementationsof high-speed FIR digital filters. IEEE Journal of Solid-StateCicuits, 31(5), 656–667.

18. Memon, T. D., Beckett, P., & Hussain, Z. M. Analysis and designof ternary FIR filter using sigma delta modulation. In INMIC, 2009(pp. 476–480).

19. Macpherson, K. N., & Stewart, R. W. (2006). Area efficient FIRfilters for high speed FPGA implementation. IEE Proceedings -Vission, Image, and Signal Processing, 153(6), 711–720.

20. Li, Y., Peng, C., Yu, D., & Zhang, X. The Implementation methodsof High Speed FIR Filter on FPGA. In Proc. ICSICT’08, 2008 (pp.2216–2219)

21. Wiatr, K., & Jamro, E. (2000)Constant coefficient multiplication inFPGA structures. In Proceedings of the 26th Euromicro Confer-ence, 2000 (Vol. 1, pp. 252–259 vol.251)

22. Schreier, R., Temes, G. C., Electrical, I. o., & Engineers, E. (2005).Understanding delta-sigma data converters. IEEE press NewJersey.

23. Pervez, A. M., Sorensen, H. V., & Spiegel, J. V. D. (1996). AnOverview of Sigma-Delta Converters. IEEE Signal ProcessingMagazine, 61–84.

24. Thompson, A. C., Hussain, Z. M., & O’Shea, P. (2003). Perfor-mance of a new single-bit ternary filtering system. In Proc.ATNAC’03, 2003

25. Ng, C.-W., Wong, N., & Ng, T.-S. (2007). Bit-stream adder andmultiplier for tri-level sigma-delta modulators. IEEE Transactionon Circuits and Systems-II: Express Briefs, 54(12), 1082–1086.

26. Johns, D. A., & Lewis, D. M. (1993). Design and analysis of delta-sigma based IIR filters. IEEE Transactions on Circuits and Sys-tems II: Analog and Digital Signal Processing, 40(4), 233–240.

27. Wong, P. W. (1992). Fully sigma-delta modulation encoded FIRfilters. IEEE Transactions on Signal Processing, 40(6), 1605–1610.

28. Thompson, A. C., Hussain, Z. M., & O’Shea, P. A correlativecriterion for the stability of sigma-delta based IIR filter: Applica-tion to an FIR-like bit-stream filter. In 2nd WSEAS InternationalConference on Electronics, Control and Signal Processing, singa-pore, 2003: PORTAL

29. Losada, R. A. (2008). Digital filters with MATLAB: Mathworks Inc.30. Mehboob, R., Khan, S. A., & Qamar, R. (2009). FIR filter design

methodology for hardware optimized implementation. IEEETransaction on Consumer Electronics, 55(3), 1669–1673.

31. Asato, C., Ditzen, C., & Dholakia, S. (1990). A data-pathmultiplier with automatic insertion of pipeline stages. IEEEJournal of Solid-State Circuits, 25(2), 383–387.

32. Altera Inc. (2009). Quartus-II Handbook Version 9.1 (Vol.volume-I: Design and Synthesis): Altera Corporation.

33. Veendrick, H. J. M. (1984). Short-circuit dissipation of staticCMOS circuitry and its impact on the design of buffer cir-cuits. IEEE Journal of Solid-State Circuits, 19, 468–473.

Tayab Memon received a BE (Hons) Electronics Engineering (FirstClass) and a PG Diploma Telecommunication and Control Engineering(First Class) from Mehran University of Engineering & Technology,Jamshoro, Pakistan, in 2003 and 2006 respectively. He is currentlyworking towards his PhD at RMIT University, Melbourne Australia.His research interests include short word length DSP Systems, wirelesscommunication and their FPGA-based implementation.

J Sign Process Syst (2013) 70:275–288 287

Page 14: 2013 Tayab Paul Amin

Paul Beckett received his B. Eng.(1975) and M. Eng. (1988) andPhD (2007) degrees from the the Royal Melbourne Institute ofTechnology (now RMIT University), Melbourne Australia. He iscurrently a Senior Lecturer in the School of Electrical and Com-puter Engineering at RMIT University. His research interests in-clude computer architecture, reconfigurable systems and integratedcircuit design.

Amin Z. Sadik BEng (Baghdad, 1983), MEng (Baghdad, 1988) andPhD (2006) in Electrical engineering, Digital Signal Processing, RMITUniversity, Melbourne, Australia. He worked with University of Tech-nology/Baghdad, Salahaddin University/Erbil, Al Balqaa University/Tafila-Jordon, RMIT/Melbourne and QUT University/Brisbane,Australia. He received several international academic awards. Hisresearch interests include digital signal processing, image processingand digital communications.

288 J Sign Process Syst (2013) 70:275–288