implementation of dsp ic

71
VSP Lecture4 - Fast Algorithms ([email protected]) Implementation of DSP IC Implementation of DSP IC Lecture 4 Fast Algorithms for Digital Signal Processing 1

Upload: others

Post on 09-Apr-2022

6 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Implementation of DSP IC

VSP Lecture4 - Fast Algorithms ([email protected])

Implementation of DSP IC

Implementation of DSP IC

Lecture 4 Fast Algorithms for Digital Signal Processing

1

Page 2: Implementation of DSP IC

VSP Lecture4 - Fast Algorithms ([email protected])

Algorithm Strength Reduction• Strength reduction leads to a reduction in

hardware complexity by exploiting substructure sharing and leads to less silicon area or power consumption in a VLSI ASIC or iteration period in a programmable DSP implementation

• Strength reduction enables design of parallel FIR filters with a less-than-linear increase in hardware

2

Page 3: Implementation of DSP IC

Algorithm Strength Reduction• Motivation

– The number of strong operations, such as multiplications, is reduced possibly at the expense of an increase in the number of weaker operations, such as additions.

• Reduce computation complexity• Example: Complex multiplication

– (a+jb)(c+jd)=e+jf, a,b,c,d,e,f R– The direct implementation requires 4 multiplications and 2

additions

– However, the number of multiplication can be reduced to 3 at the expense of 3 extra additions by using the identities

ba

cddc

fe

)()()()(

baddcbbcadbaddcabdac

3 multiplications

5 additions

Page 4: Implementation of DSP IC

VSP Lecture4 - Fast Algorithms ([email protected])

Complex Multiplication

Reduce the number of strong operation (less switched capacitance), however, increase the critical path

Speed?, Area?, Power? ….

4

Page 5: Implementation of DSP IC

[email protected] 5

Review of Discrete Fourier Transform

Page 6: Implementation of DSP IC

[email protected] 6

4 Forms of Fourier Analysis

“Sampled” frequency

Page 7: Implementation of DSP IC

[email protected] 7

Continuous-Time and Continuous-Frequency

ContinuousAperiodic

ContinuousAperiodic

Page 8: Implementation of DSP IC

[email protected] 8

Continuous-Time and Discrete-Frequency

Fourier series of periodic continuous signals

PeriodicContinuous

Discrete Aperiodic

Page 9: Implementation of DSP IC

[email protected] 9

Discrete-Time and Continuous-Frequency

Fourier transform of aperiodic discrete signals

DiscreteAperiodic Continuous

Periodic

Page 10: Implementation of DSP IC

[email protected] 10

Discrete Fourier Transform

• DFT is identical to samples of Fourier transforms• In DSP applications, we are able to store only a finite number of samples• we are able to compute the spectrum only at specific discrete values of

Page 11: Implementation of DSP IC

[email protected] 11

Discrete Fourier Transform• Discrete Fourier transform (DFT) pairs

knN

jknN

N

k

knN

N

n

knN

eW

NnWkXN

nx

NkWnxkX

2

1

0

1

0

where

,1,,1,0 ,][1][

1,,1,0 ,][][

• DFT/IDFT can be implemented by using the same hardware• It requires N2 complex multiplications and N(N-1) complex additions

N complex multiplicationsN-1 complex additions

Page 12: Implementation of DSP IC

[email protected] 12

More About DFT• Properties of Discrete Fourier

Transform• Linear Convolution and Discrete

Fourier Transform

Page 13: Implementation of DSP IC

[email protected] 13

Periodic Sequence• Consider a periodic sequence of period N• The sequence can be represented by Fourier

series

• The Fourier series for any discrete-time signal with period N requires only N harmonically related complex exponentials.

][~ nx

k

knNjekXN

nx /2][~1][~

][][ /2/2 neeene kknNjnlNkNj

lNk

1

0

/2][~1][~ N

k

knNjekXN

nx

Page 14: Implementation of DSP IC

[email protected] 14

Apply the Orthogonality property, we have

Interchange the order of summation

The coefficients are also periodic with period N

Page 15: Implementation of DSP IC

[email protected] 15

DFS Representation of a Periodic Sequence

Synthesis equation Analysis equation

NnxkX period of sequence periodic are ~ and ~

Page 16: Implementation of DSP IC

[email protected] 16

Physical Significance

Let

One period

Then

Page 17: Implementation of DSP IC

[email protected] 17

vs][~ kX )( jeX

Example

Page 18: Implementation of DSP IC

[email protected] 18

Sampling the Fourier Transform

N2

unit circle

Then

or

The sampling sequence is periodic with period N

Suppose exists

Since

Page 19: Implementation of DSP IC

[email protected] 19

Aliasing Problem 1• x[n] is infinite-length sequence

][~ nx

x

Page 20: Implementation of DSP IC

[email protected] 20

Aliasing Problem 2• If x[n] is finite-length sequence, 0nM-1• Consider the case NM

][][~ nxnx

][~ nx

Page 21: Implementation of DSP IC

[email protected] 21

Concluding Remarks

][~ nx

The case NM

or

Page 22: Implementation of DSP IC

[email protected] 22

Circular Shift of a Sequence

][]2[~

]2[~

][~

][

nRnx

nx

nx

nx

N

N=15

A rotation ofthe cylinder

Page 23: Implementation of DSP IC

[email protected] 23

Circular Shift of a Sequence

][]13[~

]13[~

][~

][

15 nRnx

nx

nx

nx

N=15

A rotation ofthe cylinder

Page 24: Implementation of DSP IC

[email protected] 24

Review of Convolution

• Given two sequences:– Data sequence xi, 0 ≤ i≤ N-1, of length N– Filter sequence hi, 0 ≤ i≤ L-1, of length L

• Linear convolution

• Direct computation, for example 2-by-2 convolution2,,1,0 , NLixhhxy iiiii

NL multiplications

hx sL-point sequence N-point

sequence

(L+N-1)-point sequence

1

0

1

01

0

2

1

0

0

0

xx

hhh

h

sss require 4 multiplications

and 1 addition

Page 25: Implementation of DSP IC

[email protected] 25

Linear Convolution

Linear Shift

Linear Shift

Linear Shift

Page 26: Implementation of DSP IC

[email protected] 26

Linear Shift vs Circular Shift

Conventional shift(linear shift)

Page 27: Implementation of DSP IC

[email protected] 27

Circular Shift Example

Page 28: Implementation of DSP IC

[email protected] 28

Periodic/Circular Convolution

Circular Shift

Page 29: Implementation of DSP IC

[email protected] 29

Circular Convolution Definition• Suppose two finite-length duration sequences:

x1[n] and x2[n] of length N

x3[n] is also a finite-length duration sequences of length N

Page 30: Implementation of DSP IC

[email protected] 30

Computation for Circular Convolution

1. To period the two sequence with period N (large enough)

2. To compute the periodic convolution of the two periodic sequences

3. To get out the duration sequence between [0, N-1]

Page 31: Implementation of DSP IC

[email protected] 31

Example

Step 1

Step 2

Step 3

Page 32: Implementation of DSP IC

[email protected] 32

Circular Convolution Property• Usually, we use the following notation to

represent the circular convolution of length N

• Circular convolution property

Page 33: Implementation of DSP IC

[email protected] 33

Circular Convolution Implementation

• Direct Implementation

hx sN-point sequence N-point

sequence

N-point sequence

44 cyclic convolution

16 multiplications12 additions

Circular Convolution

~ O(N2)

Page 34: Implementation of DSP IC

[email protected] 34

Using Circular Convolution to Implement Linear Convolution

• Consider two sequences x1[n] of length L and x2[n] of length P, respectively

• The linear convolution x3=x1[n] x2[n]

• Choose N, such that NL+P-1, then

a sequence of length L+P-1The same concept related to Winogrand Algorithm

Page 35: Implementation of DSP IC

[email protected] 35

Linear Convolution

Page 36: Implementation of DSP IC

[email protected] 36

Circular Convolution with N=L+P-1

Time aliasing in the circular convolution of two finite-length sequence can be avoided if N L+P-1

Page 37: Implementation of DSP IC

[email protected] 37

Concluding Remarks• The convolution of two finite-length sequences can be

interpreted by circular convolution with large enough length• Circular convolution can be implemented by DFT/FFT

• However, in real applications….– For an FIR system, the input sequence is of indefinite duration– To store the entire input signal requires ?

• A large delay in processing• An indefinite memory

– Block convolution

Page 38: Implementation of DSP IC

[email protected] 38

Block Convolution• Step1: To segment a sequence into

sections of length L• Step2: Each section is convolved with the

finite-length impulse response of length P by using DFT/FFT of length N=L+P-1

• Step3: The filtered sections are fitted together in an appropriate way

• Overlap-add method• Overlap-save method

Page 39: Implementation of DSP IC

[email protected] 39

Overlap-Add Methodhx y

x[n]

h[n]

Step1 Zero padding

Zero padding

Zero padding

Page 40: Implementation of DSP IC

[email protected] 40

Step2&

Step3

Time shift

][ ][][][][ N nhnxnhnxny rrr with L+P-1 length

Time shift

Page 41: Implementation of DSP IC

[email protected] 41

Fast Convolution with the FFT• Given two sequences x1 and x2 of length N1 and N2

respectively– Direct implementation requires N1N2 complex

multiplications• Consider using FFT to convolve two sequences:

– Pick N, a power of 2, such that N≥N1+N2-1– Zero-pad x1 and x2 to length N– Compute N-point FFTs of zero-padded x1 and x2, one

obtains X1 and X2– Multiply X1 and X2– Apply the IFFT to obtain the convolution sum of x1 and

x2– Computation complexity: 2(N/2) log2N + N + (N/2)log2N

Page 42: Implementation of DSP IC

[email protected] 42

Example• A sequence x[n] of length 1024• FIR filter h[n] of length 34

• Direct computation: 341024=34816• Using radix-2 FFT: 35840 (N=2048)• Using overlap-add radix-2 FFT:

– x[n] is segmented into a set of contiguous blocks of equal length 95

– Apply radix-2 FFT of length 128– Each segment requires 1472 multiplications– This algorithm requires total 16192 multiplications

Page 43: Implementation of DSP IC

Discrete Fourier Transform• Discrete Fourier transform (DFT) pairs

knN

jknN

N

k

knN

N

n

knN

eW

NnWkXN

nx

NkWnxkX

2

1

0

1

0

where

,1,,1,0 ,][1][

1,,1,0 ,][][

• DFT/IDFT can be implemented by using the same hardware• It requires N2 complex multiplications and N(N-1) complex additions

N complex multiplicationsN-1 complex additions

2/N

Page 44: Implementation of DSP IC

Decimation in Time

N+2(N/2)2 complex multiplications vs. N2 complex multiplication

twiddle factor

n2ℓ

2ℓ+1

Page 45: Implementation of DSP IC
Page 46: Implementation of DSP IC

Flow Graph of the DIT FFT

Page 47: Implementation of DSP IC
Page 48: Implementation of DSP IC

8-point DIT DFT

Page 49: Implementation of DSP IC

Remarks• It requires v=log2N stages. Each stage has N/2 butterfly

operation (radix-2 DIT FFT), which requires 2 complex multiplications and 2 complex additions

• Each stage has N complex multiplications and N complex additions

• The number of complex multiplications (as well as additions) is equal to N log2N

• By symmetry property, we have (butterfly operation)222 N

Njr

NN

Nr

NNr

N WeWWWW

2 complex multiplications2 complex additions

1 complex multiplications2 complex additions

Page 50: Implementation of DSP IC

8-point FFT

Normal orderBit-Reversed order

Page 51: Implementation of DSP IC

In-Place Computation

Stage 1

X0[000]

X0[001]

X0[010]

X0[011]

X0[100]

X0[101]

X0[110]

X0[111]

X1[000]

X1[001]

X1[010]

X1[011]

X1[100]

X1[101]

X1[110]

X1[111]

X2[000]

X2[001]

X2[010]

X2[011]

X2[100]

X2[101]

X2[110]

X2[111]

Stage 3Stage 2

X3[000]

X3[001]

X3[010]

X3[011]

X3[100]

X3[101]

X3[110]

X3[111]

The same register array can be used in each stage

Page 52: Implementation of DSP IC

8-point FFT

Normal order Bit-reversed order

Page 53: Implementation of DSP IC

Normal-Order Sorting v.s. Bit-Reversed Sorting

Normal Order Bit-reversed Order

even

odd

top

bottom

Page 54: Implementation of DSP IC

DFT v.s. Radix-2 FFT• DFT: N2 complex multiplications and N(N-1)

complex additions• Recall that each butterfly operation requires one

complex multiplication and two complex additions• FFT: (N/2) log2N multiplications and N log2N

complex additions

• In-place computations: the input and the output nodes for each butterfly operation are horizontally adjacent only one storage arrays will be required

Page 55: Implementation of DSP IC

Decimation in Frequency (DIF)• Recall that the DFT is

• DIT FFT algorithm is based on the decomposition of the DFT computations by forming small subsequences in time domain index “n”: n=2ℓ or n=2ℓ+1

• One can consider dividing the output sequence X[k], in frequency domain, into smaller subsequences: k=2r or k=2r+1:

10 ,][1

0

NkWnxkXN

n

nkN

Substitution of variables

Page 56: Implementation of DSP IC

DIF FFT Algorithm (1)

is just N/2-point DFT. Similarly,

Page 57: Implementation of DSP IC

DIF FFT Algorithm (2)

v=log2N stages, each stage has N/2 butterfly operation.

(N/2)log2N complex multiplications, N complex additions

Page 58: Implementation of DSP IC

Remarks• The basic butterfly operations for DIT FFT and DIF FFT

respectively are transposed-form pair.

• The I/O values of DIT FFT and DIF FFT are the same• Applying the transpose transform to each DIT FFT

algorithm, one obtains DIF FFT algorithm

DIF BF unitDIT BF unit

Page 59: Implementation of DSP IC

Fast Convolution with the FFT• Given two sequences x1 and x2 of length N1 and N2

respectively– Direct implementation requires N1N2 complex

multiplications• Consider using FFT to convolve two sequences:

– Pick N, a power of 2, such that N≥N1+N2-1– Zero-pad x1 and x2 to length N– Compute N-point FFTs of zero-padded x1 and x2, then we

obtain X1 and X2– Multiply X1 and X2– Apply the IFFT to obtain the convolution sum of x1 and

x2– Computation complexity: 2(N/2) log2N + N + (N/2)log2N

Page 60: Implementation of DSP IC

Implementation Issues• Radix-2, Radix-4, Radix-8, Split-Radix,Radix-22, …, • I/O Indexing• In-place computation

– Bit-reversed sorting is necessary– Efficient use of memory– Random access (not sequential) of memory. An address

generator unit is required.– Good for cascade form: FFT followed by IFFT (or vice

versa)• E.g. fast convolution algorithm

• Twiddle factors– Look up table– CORDIC rotator

Page 61: Implementation of DSP IC

FIR Filters

2

1

0

1

01

01

0

3

2

1

0

000

000

xxx

hhh

hhh

yyyy Transform-domainTime-domain

Page 62: Implementation of DSP IC

Example: Linear Phase FIRLinear phase FIR filter: with approximately constant frequency-response magnitude and linear phase (constant group delay) in pass-band

N-tap

N multipliersN-1 adders

(N+1)/2 multipliersN-1 adders, if odd N

N/2 multipliersN-1 adders, if even N

By exploiting substructure sharing to reduce area

Page 63: Implementation of DSP IC

An Efficient Decomposition• Example: 2-fold decomposition

• Example 3-fold decomposition

• General case (N-fold decomposition)

)(

421

)(

642

654321

21

20

)]5[]3[]1[()]6[]4[]2[]0[( ]6[]5[]4[]3[]2[]1[]0[)(

zHzH

zhzhhzzhzhzhhzhzhzhzhzhzhhzH

)(

32

)(

31

)(

63

654321

32

31

30

)]5[]2[()]4[]1[()]6[]3[]0[( ]6[]5[]4[]3[]2[]1[]0[)(

zHzHzH

zhhzzhhzzhzhhzhzhzhzhzhzhhzH

k

kl

N

l

Nl

l

k

k zlNkhzHzHzzkhzH ][)( where,)(][)(1

0

Page 64: Implementation of DSP IC

Traditional Parallel Architecture• 2-fold parallel architecture

4(N/2) multiplicationsN/2-tap 4(N/2-1)+2 additions

Page 65: Implementation of DSP IC

Traditional Parallel FIR

L-parallel FIR filter of length N/L requires 1. L2 (N/L) multiplications, i.e. LN multiplications2. L2 (N/L -1) +L(L-1) additions, i.e. L(N-1) additions

~ LN multiply-add operations

Page 66: Implementation of DSP IC
Page 67: Implementation of DSP IC

Fast FIR Algorithm (FFA)• First by applying L-fold polyphase

decomposition for H(z)– There are L filters of length N/L

• By applying Winograd algorithm– 2 polynomials of degree (N/L)-1 can be

implemented by using 2 (N/L)-1 product terms.– Each product terms are equivalent to filtering

operations in the block formulation– Consequently, it can be realized using

approximately L FIR filters of length N/L It requires 2N-L multiplications

Page 68: Implementation of DSP IC

VSP Lecture4 - Fast Algorithms ([email protected])

FIR Using Polyphase Decomposition

68

Page 69: Implementation of DSP IC

VSP Lecture4 - Fast Algorithms ([email protected])

Traditional Parallel Architecture• 2-fold parallel architecture

4(N/2) multiplicationsN/2-tap 4(N/2-1)+2 additions

69

Page 70: Implementation of DSP IC

VSP Lecture4 - Fast Algorithms ([email protected])

2-Parallel Fast FIR Filter

70

Page 71: Implementation of DSP IC

VSP Lecture4 - Fast Algorithms ([email protected])

2-Parallel FFA• It requires 3 distinct sub-filters of length N/2

and 4 pre/post-processing additions. • Totally, it requires 1.5N multiplications and

3(N/2 -1)+4=1.5N +1 additions

71