[ieee 2009 norchip - trondheim, norway (2009.11.16-2009.11.17)] 2009 norchip - recursive fir filter...

RECURSIVE FIR FILTER STRUCTURES ON FPGA

Tarja Tauren and Olli Vainio

Tampere University of TechnologyDepartment of Computer Systems

P.O. Box 553, FIN-33101 Tampere, Finland

Raija Lehto

Tampere University of TechnologyDepartment of Signal Processing

P.O. Box 553, FIN-33101 Tampere, Finland

ABSTRACT

A new approach to piecewise-polynomial approximation andrecursive implementation structures for linear-phase FiniteImpulse Response (FIR) filters have been recently proposed.In this paper, we describe hardware prototype implemen-tations of the new structures for all four types of linear-phase FIR filters using a Field Programmable Gate Array(FPGA) based platform. Narrowband lowpass filters andnarrowband differentiators are used as design examples todemonstrate the functionality and efficiency of the imple-mentations. The required wordlength and resource usage isanalyzed.

Index terms– Filter structure, finite impulse response (FIR)digital filters, FPGA, piecewise polynomial

1. INTRODUCTION

Digital FIR filters are widely used in many signal processingapplications due to their many well-known favorable prop-erties and popular design methods1. Their main drawbackis high arithmetic complexity, i.e., the required number ofadders, multipliers and delay elements needed in conven-tional implementations especially when a narrow transitionband is required [1].

The impulse response of a narrowband FIR filter gener-ally has a smooth shape, and there is strong correlation be-tween successive coefficient values. Therefore, piecewise-polynomial approximation of the impulse response can givesignificant savings in arithmetic complexity. On the otherhand, polynomial responses can be efficiently generated us-ing recursive structures based on cascaded accumulators [2].

Boudreaux and Parks were among the first to propose arecursive piecewise-polynomial approximation for the im-pulse response of an FIR filter [3]. This principle was fur-ther generalized by Chu and Burrus [4],[5]. Piecewise poly-nomial approximations have also been studied in [6],[7].More recently, new efficient recursive structures for FIR fil-ter implementation have been developed by Lehto, Saramaki

1This work was supported by the Academy of Finland under grant127858.

and Vainio [8],[9]. In their approach, the optimization of thecoefficients for the piecewise-polynomial approximation isdone by linear programming. A recursive structure can befound for all four types of linear-phase FIR filters.

The objective of this work is to demonstrate the newrecursive FIR structures on an FPGA board. In the nextsection, we briefly describe the principles of the proposedstructures. Section 3 gives a summary of the Altera plat-form used in this work. Section 4 gives results of the designexamples for all four types of linear-phase FIR filters. Sec-tion 5 concludes the paper.

2. FILTER STRUCTURES

The impulse response is symmetrical for FIR filter Types 1and 2, and anti-symmetrical for Types 3 and 4. The prin-ciple behind the piecewise-polynomial approximation in anutshell is to divide the overall impulse response into subre-sponses and to generate each subresponse with polynomialsof a given degree. The subresponses are of different lengthsand after summing them up the overall shape is obtained,with a different number of polynomials up to the center ofsymmetry in each block of the overall impulse response.All the subresponses are centered to the middle. Thus, thefirst subresponse consists of one polynomial, the second oftwo polynomials etc. The center subresponse consist of thehighest number, M polynomials. The most advantageousdegree of the polynomials is typically L = 3, as using ahigher degree for the polynomials means needing more co-efficients with only slightly increased accuracy.

The general implementation structure for Type 1 filtersis shown in Fig. 1. For the sake of clarity of the diagram, thecoefficients α

(m)k have been drawn twice. In a practical im-

plementation, the inputs of the left-hand side and right-handside α

(m)k s [-α(m)

k s] are added [subtracted] and the result is

multiplied by α(m)k . For any M , the accumulators on the

right hand side are implemented only once but the larger Mis, the more delays and α coefficients are needed, and thestructure expands horizontally. The polynomial degree is L.Thus increasing L makes the structure grow in the vertical

978-1-4244-4311-6/09/$25.00 ©2009 IEEE

α(M)2

−α(M)3

−α(M)1

α(M)4

−α(M)L

−α(M)L+1

Z-1

Z -TM

β2

Z - TM

β4

Z-1

βL+1

Z-1

Z-1

OUT

α(M)2

Z-1 α(M)3

Z-1

α(M)1

Z-1

α(M)4

α(M)L

Z-1

Z-1

α(M)L+1

Z -T2

α(2)2

Z-1 α(2)3

Z-1

Z-1

α(2)L

Z-1

Z-1

α(2)L+1

Z -T1

Z-1 α(1)3

Z-1

α(1)1

Z-1

α(1)4

α(1)L

Z-1

Z-1

α(1)L+1

α(2)2

−α(2)3

−α(2)1

α(2)4

−α(2)L

α(2)L+1

Z -T2

α(1)2

−α(1)3

−α(1)1

α(1)4

−α(1)L

α(1)L+1

Z -T1IN

α(1)2

α(2)4

α(2)1

Z-1

Z-1

Z-1

Z-1

Z-1

1 2

Fig. 1. Implementation structure for Type 1 FIR filters [8].

dimension in Fig. 1.

The structure is slightly modified for the other FIR filtertypes due to the different symmetry properties.

To generate responses of finite length from a structurethat includes feedback loops, pole-zero cancellation is es-sential. Therefore the following important considerationsarise. Quantization of the polynomial coefficients should bedone before deriving the coefficients in the structure. Thestructure does not automatically recover from any tempo-rary data errors in accumulators due to feedback loops. Toensure pole-zero cancellation in the long run, a switchingand resetting principle as discussed in [8] can be used, i.e.,two structures are implemented in parallel such that the dataregisters are doubled. One of the modules is operationalwhile the other is being reset, and the roles are periodicallyswitched. Such a structure is shown in Fig. 2.

Two’s complement arithmetic is used that has the usefulproperty of allowing overflows in intermediate additions aslong as the final result is in the range [−1, 1 − 2−b], whereb is the number of fractional bits. It is also beneficial to usedouble wordlength in accumulators to minimize quantiza-tion errors. Worst-case scaling is required, i.e., the sum ofthe absolute values of the impulse response has to be lessthan or equal to unity. If necessary, the input signal or thecoefficients are scaled accordingly.

x(n)

x1(n) y1(n)

y(n)

x2(n)H(z)

H(z)

y2(n)

Fig. 2. Practical implementation based on switching andresetting, where a demultiplexer is used to decompose theinput x(n) into two signals x1(n) and x2(n) [8].

3. FPGA PLATFORM

The hardware platform used in this work is the Altera DE2development and education board that is based on the AlteraCyclone II EP2C35 FPGA [10]. Some of the features of theboard are the following:

- 16-Mbit serial configuration device- Built-in USB interface- 8-MBytes SDRAM, 512K SRAM, 4-MBytes Flash- 18 toggle switches, four pushbutton switches- 18 red LEDs, 9 green LEDs- 16 x 2 LCD display, eight 7-segment displays

- 50 MHz and 27 MHz crystal oscillators- RS232, Infrared port, PS/2, 10/100 Ethernet- Video in and video out- Expansion headers (76 signal pins)

The key features of the Cyclone II EP2C35 FPGA are thefollowing:

- 33216 Logic Elements- 105 M4K RAM blocks- 483,840 total RAM bits- 35 embedded 18 x 18 multipliers- Four PLLs- 475 user I/O pinsVHDL description was used for design entry in a PC

environment. Simulations were carried out using Modelsimfrom Mentor Graphics [11]. Matlab [12] was used for co-efficient optimization, early structural verification and forconverting the implementation results between the time do-main and frequency domain. The synthesis tool used in thiswork was Altera’s Quartus II. The software also includes atool for downloading the synthesized design into the FPGA.The functionality of the implementation was inspected by asoftware-based logic analyzer called SignalTap Logic Ana-lyzer.

4. FILTER IMPLEMENTATIONS

In this section, we show the implementation results for de-sign examples of the four types of linear-phase FIR filters.Narrowband lowpass filters are implemented as types 1 and2, and narrowband differentiators as types 3 and 4. The de-sign specifications are the following:

Passband edge ωp = 0.025πStopband edge ωs = 0.050πPassband ripple δp = 0.01Stopband ripple δs = 0.001 (-60 dB).For all of these designs, the polynomial degree L = 3

and the number of slices (see def. in [8], Eq. (16a)-(16b))M = 5, i.e., the overall impulse response consists of totally10 blocks and 5 subresponses. Altogether 22 coefficientsare needed in the implementation so that the number of αcoefficients is 20 and the number of β coefficients is 2. Thenumber of implementation coefficients depends only on thepolynomial degree and the number of slices, not on the ac-tual filter length.

4.1. Type 1 FIR Filter

A Type 1 FIR Filter is characterized by the symmetry prop-erty: h(N −n) = h(n), n = 0, 1, · · · , N , where N is even.

Using the piecewise-polynomial approximation, the spec-ifications are met by a filter of length 223. The minimumwordlength for which the specifications are met is 1+33bits. The impulse response and the magnitude response are

shown in Fig. 3. Table 1 shows the quantized implementa-tion coefficients in two’s complement arithmetic of our Type1 lowpass filter. As seen in Table 1, the quantized coeffi-cients have quite small values, which means that there areseveral leading zeros, namely 12 in this case. This meansthat the effective number of bits is 21. Table 2 shows thedelays used in our example.

0 50 100 150 200−0.01

−0.005

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04Impulse Response

(a)

n samples

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

−80

−70

−60

−50

−40

−30

−20

−10

0

Normalized Frequency (×π rad/sample)

Am

plitu

de in

dB

Magnitude Response (dB)

0 0.005 0.01 0.015 0.02 0.025

0.99

0.995

1

1.005

1.01

Lin. Amp.

(b)

Fig. 3. (a) Impulse response and (b) magnitude response forType 1 design example.

The roundoff error in the finite wordlength implementa-tion of the impulse response is shown in Fig. 4. This figurethus shows the difference between the design result fromMatlab and the actual implementation.

The implementation of a single instance of the structureuses 2265 Logic Elements (LE), corresponding to 7% ofthe FPGA capacity. The number of memory bits needed is44032 (9%) and the number of registers is 1204. A switch-ing and resetting structure uses 4206 LEs (13%), 44032memory bits, and 1580 registers.

0 50 100 150 200−4

−2

0

2

4

6

8x 10

−6

Fig. 4. Roundoff error in the impulse response for the Type1 filter

Table 1. Quantized implementation coefficients in two’scomplement form of Type 1 lowpass filter.

Slice M αk

1 α1 = −1.343471230939030E − 04

1 α2 = −2.206047065556050E − 05

1 α3 = −4.925997927784920E − 06

1 α4 = 9.255018085241320E − 07

2 α1 = 2.699275501072410E − 05

2 α2 = −2.851814497262240E − 05

2 α3 = 4.577450454235080E − 06

2 α4 = −3.453344106674190E − 06

3 α1 = −1.333394320681690E − 04

3 α2 = 7.375888526439670E − 05

3 α3 = −1.375330612063410E − 05

3 α4 = 9.295530617237090E − 06

4 α1 = −9.722949471324680E − 05

4 α1 = −1, 047406112775210E − 04

4 α3 = −3.310199826955800E − 05

4 α4 = −1, 595565117895600E − 05

5 α1 = −1.252338988706470E − 04

5 α2 = −7.988919969648120E − 05

5 α3 = −4.153233021497730E − 05

5 α4 = 6.272457540035250E − 07

βk

β2 = 7.022568024694920E − 05

β4 = 1.712143421173100E − 05

Table 2. Delays Tks of each slice of the lowpass Type 1filter.

Delay Tk

T1 = 23 T2 = 27 T3 = 31 T4 = 17 T5(1) = 13

T5(2) = 14 T4 = 17 T3 = 31 T2 = 27 T1 = 23


A Type 2 FIR Filter is characterized by the symmetry prop-erty: h(N − n) = h(n), n = 0, 1, · · · , N , where N is odd.

The specifications are met by a filter of length 224. Theminimum wordlength for which the specifications are metis 1+34 bits. The impulse response and the magnitude re-sponse are shown in Fig. 5.

0 50 100 150 200−0.01

−0.005

0

0.005

0.01

0.015

0.02

0.025

0.03

0.035

0.04Impulse Response

(a)

n samples

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

−80

−70

−60

−50

−40

−30

−20

−10

0


Am

plitu

de in

dB

Magnitude Response (dB)

0.005 0.01 0.015 0.02 0.025

0.99

0.995

1

1.005

1.01

Lin. Amp.

(b)

Fig. 5. (a) Impulse response and (b) magnitude response forType 2 design example.

The implementation uses 2338 LEs (7%), 44032 bits ofmemory (9%) and the number of registers is 1210.


A Type 3 FIR Filter is characterized by the antisymmetryproperty: h(N − n) = −h(n), n = 0, 1, · · · , N , where Nis even. Because the filter should behave as a narrowbanddifferentiator, the desired zero-phase frequency response onthe passband is D(ω) = ω, and the minimum stopband at-tenuation is 60 dB.

The specifications are met by a filter of length 231, andthe needed wordlength is 1+38 bits. The impulse responseand the zero-phase magnitude response are shown in Fig. 6.

0 50 100 150 200−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5x 10

−3 Impulse Response

n samples

(a)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09


Am

plitu

de

Zero−phase Response

(b)

Fig. 6. (a) Impulse response and (b) zero-phase responsefor Type 3 design example.

The implementation uses 2590 LEs (8%), 46080 bits ofmemory (10%), and 1257 registers.


A Type 4 FIR Filter is characterized by the antisymmetryproperty: h(N − n) = −h(n), n = 0, 1, · · · , N , where Nis odd. Also in this case a differentiator is desired.

The specifications are met by a filter of length 232, andthe needed wordlength is 1+37 bits. The impulse responseand the zero-phase magnitude response are shown in Fig. 7.

The implementation uses 2409 Logic Elements (7%),46080 bits of memory (10%), and 1252 registers.

0 50 100 150 200−2.5

−2

−1.5

−1

−0.5

0

0.5

1

1.5

2

2.5x 10

−3 Impulse Response

n samples

(a)

0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9

0

0.01

0.02

0.03

0.04

0.05

0.06

0.07

0.08

0.09


Am

plitu

de

Zero−phase Response

(b)

Fig. 7. (a) Impulse response and (b) zero-phase responsefor Type 4 design example.

5. CONCLUSIONS

Recursive FIR filter implementation structures have beensuccesfully demonstrated on FPGA for all four types of linear-phase FIR filters. The observed responses satisfy the de-sign specifications. Significant computational savings areachieved compared to direct-form structures as the numberof needed multiplications is much lower. Only a small frac-tion of the FPGA capacity is used in the example cases forfilter lengths exceeding 200. There is very little differencein resource usage between the four types.

The implementations are fully parallel such that eachnew sample is processed in one clock cycle. Because of thecascaded additions, the critical timing path is quite long.The achievable sampling rate for these designs is about 35MHz.

It was found that the structures are numerically quitesensitive and the required wordlengths are relatively large.

This is not a problem in customized FPGA implementationswhere resources can be allocated as needed, but good accu-racy is needed in all phases of the design process.

6. REFERENCES

[1] T. Saramaki, ”Finite impulse response filter design,” inHandbook for Digital Signal Processing, S. K. Mitraand J. F. Kaiser, Eds. New York: Wiley, 1993, ch. 4,pp. 155-277.

[2] T. Saramaki and O. Vainio, ”Structures for generatingpolynomial responses,” in Proc. 37th Midwest Symp.Circuits and Systems, vol. 2, pp. 1315-1318, Aug. 1994.

[3] G. F. Boudreaux and T. W. Parks, ”Thinning digi-tal filters: A piecewise-exponential approximation ap-proach,” IEEE Trans. Acoust., Speech, Signal Process.,vol. ASSP-31, pp. 105-113, Feb. 1983.

[4] S. Chu and S. Burrus, ”Efficient recursive realizationsof FIR filters, part I: The filter structures,” Circuits, Sys-tems and Signal Processing,” vol. 3, pp. 2-20, 1984.

[5] S. Chu and S. Burrus, ”Efficient recursive realizationsof FIR filters, part II: Design and applications,” Cir-cuits, Systems and Signal Processing,” vol. 3, pp. 21-57,1984.

[6] T. G. Campbell and T. Saramaki, ”Recursive linear-phase FIR filter structures with piecewise-polynomialimpulse response,” in Proc. 6th Int. Symp. Networks,Syst. Signal Process., Zagreb, Yuogoslavia, June 1989,pp. 16-19.

[7] T. Saramaki and S. K. Mitra, ”Design and implementa-tion of narrow-band linear-phase FIR filters with piece-wise polynomial impulse response,” in Proc. IEEEInt. Symp. Circuits Syst., ISCAS’99, Orlando, FL,Jul. 1999, vol 3, pp. 456-461.

[8] R. Lehto, T. Saramaki and O. Vainio, ”Synthesis ofnarrowband linear-phase FIR filters with a piecewise-polynomial impulse response,” IEEE Trans. CircuitsSyst. I, vol. 54, no. 10, pp. 2262-2276, Oct. 2007.

[9] R. Lehto, Synthesis Methods for Linear-Phase FIR Fil-ters with a Piecewise-Polynomial Impulse Response,doctoral dissertation, Tampere University of Technol-ogy, Tampere, Finland, 2009, Publications 814.

[10] http://www.altera.com/

[11] http://www.mentor.com/

[12] http://www.mathworks.com/

[ieee 2009 norchip - trondheim, norway (2009.11.16-2009.11.17)] 2009 norchip - recursive fir filter...

Documents