implementation of dsp ic
Post on 09-Apr-2022
6 Views
Preview:
TRANSCRIPT
VSP Lecture4 - Fast Algorithms (cwliu@twins.ee.nctu.edu.tw)
Implementation of DSP IC
Implementation of DSP IC
Lecture 4 Fast Algorithms for Digital Signal Processing
1
VSP Lecture4 - Fast Algorithms (cwliu@twins.ee.nctu.edu.tw)
Algorithm Strength Reduction• Strength reduction leads to a reduction in
hardware complexity by exploiting substructure sharing and leads to less silicon area or power consumption in a VLSI ASIC or iteration period in a programmable DSP implementation
• Strength reduction enables design of parallel FIR filters with a less-than-linear increase in hardware
2
Algorithm Strength Reduction• Motivation
– The number of strong operations, such as multiplications, is reduced possibly at the expense of an increase in the number of weaker operations, such as additions.
• Reduce computation complexity• Example: Complex multiplication
– (a+jb)(c+jd)=e+jf, a,b,c,d,e,f R– The direct implementation requires 4 multiplications and 2
additions
– However, the number of multiplication can be reduced to 3 at the expense of 3 extra additions by using the identities
ba
cddc
fe
)()()()(
baddcbbcadbaddcabdac
3 multiplications
5 additions
VSP Lecture4 - Fast Algorithms (cwliu@twins.ee.nctu.edu.tw)
Complex Multiplication
Reduce the number of strong operation (less switched capacitance), however, increase the critical path
Speed?, Area?, Power? ….
4
cwliu@twins.ee.nctu.edu.tw 5
Review of Discrete Fourier Transform
cwliu@twins.ee.nctu.edu.tw 6
4 Forms of Fourier Analysis
“Sampled” frequency
cwliu@twins.ee.nctu.edu.tw 7
Continuous-Time and Continuous-Frequency
ContinuousAperiodic
ContinuousAperiodic
cwliu@twins.ee.nctu.edu.tw 8
Continuous-Time and Discrete-Frequency
Fourier series of periodic continuous signals
PeriodicContinuous
Discrete Aperiodic
cwliu@twins.ee.nctu.edu.tw 9
Discrete-Time and Continuous-Frequency
Fourier transform of aperiodic discrete signals
DiscreteAperiodic Continuous
Periodic
cwliu@twins.ee.nctu.edu.tw 10
Discrete Fourier Transform
• DFT is identical to samples of Fourier transforms• In DSP applications, we are able to store only a finite number of samples• we are able to compute the spectrum only at specific discrete values of
cwliu@twins.ee.nctu.edu.tw 11
Discrete Fourier Transform• Discrete Fourier transform (DFT) pairs
knN
jknN
N
k
knN
N
n
knN
eW
NnWkXN
nx
NkWnxkX
2
1
0
1
0
where
,1,,1,0 ,][1][
1,,1,0 ,][][
• DFT/IDFT can be implemented by using the same hardware• It requires N2 complex multiplications and N(N-1) complex additions
N complex multiplicationsN-1 complex additions
cwliu@twins.ee.nctu.edu.tw 12
More About DFT• Properties of Discrete Fourier
Transform• Linear Convolution and Discrete
Fourier Transform
cwliu@twins.ee.nctu.edu.tw 13
Periodic Sequence• Consider a periodic sequence of period N• The sequence can be represented by Fourier
series
• The Fourier series for any discrete-time signal with period N requires only N harmonically related complex exponentials.
][~ nx
k
knNjekXN
nx /2][~1][~
][][ /2/2 neeene kknNjnlNkNj
lNk
1
0
/2][~1][~ N
k
knNjekXN
nx
cwliu@twins.ee.nctu.edu.tw 14
Apply the Orthogonality property, we have
Interchange the order of summation
The coefficients are also periodic with period N
cwliu@twins.ee.nctu.edu.tw 15
DFS Representation of a Periodic Sequence
Synthesis equation Analysis equation
NnxkX period of sequence periodic are ~ and ~
cwliu@twins.ee.nctu.edu.tw 16
Physical Significance
Let
One period
Then
cwliu@twins.ee.nctu.edu.tw 17
vs][~ kX )( jeX
Example
cwliu@twins.ee.nctu.edu.tw 18
Sampling the Fourier Transform
N2
unit circle
Then
or
The sampling sequence is periodic with period N
Suppose exists
Since
cwliu@twins.ee.nctu.edu.tw 19
Aliasing Problem 1• x[n] is infinite-length sequence
][~ nx
x
cwliu@twins.ee.nctu.edu.tw 20
Aliasing Problem 2• If x[n] is finite-length sequence, 0nM-1• Consider the case NM
][][~ nxnx
][~ nx
cwliu@twins.ee.nctu.edu.tw 21
Concluding Remarks
][~ nx
The case NM
or
cwliu@twins.ee.nctu.edu.tw 22
Circular Shift of a Sequence
][]2[~
]2[~
][~
][
nRnx
nx
nx
nx
N
N=15
A rotation ofthe cylinder
cwliu@twins.ee.nctu.edu.tw 23
Circular Shift of a Sequence
][]13[~
]13[~
][~
][
15 nRnx
nx
nx
nx
N=15
A rotation ofthe cylinder
cwliu@twins.ee.nctu.edu.tw 24
Review of Convolution
• Given two sequences:– Data sequence xi, 0 ≤ i≤ N-1, of length N– Filter sequence hi, 0 ≤ i≤ L-1, of length L
• Linear convolution
• Direct computation, for example 2-by-2 convolution2,,1,0 , NLixhhxy iiiii
NL multiplications
hx sL-point sequence N-point
sequence
(L+N-1)-point sequence
1
0
1
01
0
2
1
0
0
0
xx
hhh
h
sss require 4 multiplications
and 1 addition
cwliu@twins.ee.nctu.edu.tw 25
Linear Convolution
Linear Shift
Linear Shift
Linear Shift
cwliu@twins.ee.nctu.edu.tw 26
Linear Shift vs Circular Shift
Conventional shift(linear shift)
cwliu@twins.ee.nctu.edu.tw 27
Circular Shift Example
cwliu@twins.ee.nctu.edu.tw 28
Periodic/Circular Convolution
Circular Shift
cwliu@twins.ee.nctu.edu.tw 29
Circular Convolution Definition• Suppose two finite-length duration sequences:
x1[n] and x2[n] of length N
x3[n] is also a finite-length duration sequences of length N
cwliu@twins.ee.nctu.edu.tw 30
Computation for Circular Convolution
1. To period the two sequence with period N (large enough)
2. To compute the periodic convolution of the two periodic sequences
3. To get out the duration sequence between [0, N-1]
cwliu@twins.ee.nctu.edu.tw 31
Example
Step 1
Step 2
Step 3
cwliu@twins.ee.nctu.edu.tw 32
Circular Convolution Property• Usually, we use the following notation to
represent the circular convolution of length N
• Circular convolution property
cwliu@twins.ee.nctu.edu.tw 33
Circular Convolution Implementation
• Direct Implementation
hx sN-point sequence N-point
sequence
N-point sequence
44 cyclic convolution
16 multiplications12 additions
Circular Convolution
~ O(N2)
cwliu@twins.ee.nctu.edu.tw 34
Using Circular Convolution to Implement Linear Convolution
• Consider two sequences x1[n] of length L and x2[n] of length P, respectively
• The linear convolution x3=x1[n] x2[n]
• Choose N, such that NL+P-1, then
a sequence of length L+P-1The same concept related to Winogrand Algorithm
cwliu@twins.ee.nctu.edu.tw 35
Linear Convolution
cwliu@twins.ee.nctu.edu.tw 36
Circular Convolution with N=L+P-1
Time aliasing in the circular convolution of two finite-length sequence can be avoided if N L+P-1
cwliu@twins.ee.nctu.edu.tw 37
Concluding Remarks• The convolution of two finite-length sequences can be
interpreted by circular convolution with large enough length• Circular convolution can be implemented by DFT/FFT
• However, in real applications….– For an FIR system, the input sequence is of indefinite duration– To store the entire input signal requires ?
• A large delay in processing• An indefinite memory
– Block convolution
cwliu@twins.ee.nctu.edu.tw 38
Block Convolution• Step1: To segment a sequence into
sections of length L• Step2: Each section is convolved with the
finite-length impulse response of length P by using DFT/FFT of length N=L+P-1
• Step3: The filtered sections are fitted together in an appropriate way
• Overlap-add method• Overlap-save method
cwliu@twins.ee.nctu.edu.tw 39
Overlap-Add Methodhx y
x[n]
h[n]
Step1 Zero padding
Zero padding
Zero padding
cwliu@twins.ee.nctu.edu.tw 40
Step2&
Step3
Time shift
][ ][][][][ N nhnxnhnxny rrr with L+P-1 length
Time shift
cwliu@twins.ee.nctu.edu.tw 41
Fast Convolution with the FFT• Given two sequences x1 and x2 of length N1 and N2
respectively– Direct implementation requires N1N2 complex
multiplications• Consider using FFT to convolve two sequences:
– Pick N, a power of 2, such that N≥N1+N2-1– Zero-pad x1 and x2 to length N– Compute N-point FFTs of zero-padded x1 and x2, one
obtains X1 and X2– Multiply X1 and X2– Apply the IFFT to obtain the convolution sum of x1 and
x2– Computation complexity: 2(N/2) log2N + N + (N/2)log2N
cwliu@twins.ee.nctu.edu.tw 42
Example• A sequence x[n] of length 1024• FIR filter h[n] of length 34
• Direct computation: 341024=34816• Using radix-2 FFT: 35840 (N=2048)• Using overlap-add radix-2 FFT:
– x[n] is segmented into a set of contiguous blocks of equal length 95
– Apply radix-2 FFT of length 128– Each segment requires 1472 multiplications– This algorithm requires total 16192 multiplications
Discrete Fourier Transform• Discrete Fourier transform (DFT) pairs
knN
jknN
N
k
knN
N
n
knN
eW
NnWkXN
nx
NkWnxkX
2
1
0
1
0
where
,1,,1,0 ,][1][
1,,1,0 ,][][
• DFT/IDFT can be implemented by using the same hardware• It requires N2 complex multiplications and N(N-1) complex additions
N complex multiplicationsN-1 complex additions
2/N
Decimation in Time
N+2(N/2)2 complex multiplications vs. N2 complex multiplication
twiddle factor
n2ℓ
2ℓ+1
Flow Graph of the DIT FFT
8-point DIT DFT
Remarks• It requires v=log2N stages. Each stage has N/2 butterfly
operation (radix-2 DIT FFT), which requires 2 complex multiplications and 2 complex additions
• Each stage has N complex multiplications and N complex additions
• The number of complex multiplications (as well as additions) is equal to N log2N
• By symmetry property, we have (butterfly operation)222 N
Njr
NN
Nr
NNr
N WeWWWW
2 complex multiplications2 complex additions
1 complex multiplications2 complex additions
8-point FFT
Normal orderBit-Reversed order
In-Place Computation
Stage 1
X0[000]
X0[001]
X0[010]
X0[011]
X0[100]
X0[101]
X0[110]
X0[111]
X1[000]
X1[001]
X1[010]
X1[011]
X1[100]
X1[101]
X1[110]
X1[111]
X2[000]
X2[001]
X2[010]
X2[011]
X2[100]
X2[101]
X2[110]
X2[111]
Stage 3Stage 2
X3[000]
X3[001]
X3[010]
X3[011]
X3[100]
X3[101]
X3[110]
X3[111]
The same register array can be used in each stage
8-point FFT
Normal order Bit-reversed order
Normal-Order Sorting v.s. Bit-Reversed Sorting
Normal Order Bit-reversed Order
even
odd
top
bottom
DFT v.s. Radix-2 FFT• DFT: N2 complex multiplications and N(N-1)
complex additions• Recall that each butterfly operation requires one
complex multiplication and two complex additions• FFT: (N/2) log2N multiplications and N log2N
complex additions
• In-place computations: the input and the output nodes for each butterfly operation are horizontally adjacent only one storage arrays will be required
Decimation in Frequency (DIF)• Recall that the DFT is
• DIT FFT algorithm is based on the decomposition of the DFT computations by forming small subsequences in time domain index “n”: n=2ℓ or n=2ℓ+1
• One can consider dividing the output sequence X[k], in frequency domain, into smaller subsequences: k=2r or k=2r+1:
10 ,][1
0
NkWnxkXN
n
nkN
Substitution of variables
DIF FFT Algorithm (1)
is just N/2-point DFT. Similarly,
DIF FFT Algorithm (2)
v=log2N stages, each stage has N/2 butterfly operation.
(N/2)log2N complex multiplications, N complex additions
Remarks• The basic butterfly operations for DIT FFT and DIF FFT
respectively are transposed-form pair.
• The I/O values of DIT FFT and DIF FFT are the same• Applying the transpose transform to each DIT FFT
algorithm, one obtains DIF FFT algorithm
DIF BF unitDIT BF unit
Fast Convolution with the FFT• Given two sequences x1 and x2 of length N1 and N2
respectively– Direct implementation requires N1N2 complex
multiplications• Consider using FFT to convolve two sequences:
– Pick N, a power of 2, such that N≥N1+N2-1– Zero-pad x1 and x2 to length N– Compute N-point FFTs of zero-padded x1 and x2, then we
obtain X1 and X2– Multiply X1 and X2– Apply the IFFT to obtain the convolution sum of x1 and
x2– Computation complexity: 2(N/2) log2N + N + (N/2)log2N
Implementation Issues• Radix-2, Radix-4, Radix-8, Split-Radix,Radix-22, …, • I/O Indexing• In-place computation
– Bit-reversed sorting is necessary– Efficient use of memory– Random access (not sequential) of memory. An address
generator unit is required.– Good for cascade form: FFT followed by IFFT (or vice
versa)• E.g. fast convolution algorithm
• Twiddle factors– Look up table– CORDIC rotator
FIR Filters
2
1
0
1
01
01
0
3
2
1
0
000
000
xxx
hhh
hhh
yyyy Transform-domainTime-domain
Example: Linear Phase FIRLinear phase FIR filter: with approximately constant frequency-response magnitude and linear phase (constant group delay) in pass-band
N-tap
N multipliersN-1 adders
(N+1)/2 multipliersN-1 adders, if odd N
N/2 multipliersN-1 adders, if even N
By exploiting substructure sharing to reduce area
An Efficient Decomposition• Example: 2-fold decomposition
• Example 3-fold decomposition
• General case (N-fold decomposition)
)(
421
)(
642
654321
21
20
)]5[]3[]1[()]6[]4[]2[]0[( ]6[]5[]4[]3[]2[]1[]0[)(
zHzH
zhzhhzzhzhzhhzhzhzhzhzhzhhzH
)(
32
)(
31
)(
63
654321
32
31
30
)]5[]2[()]4[]1[()]6[]3[]0[( ]6[]5[]4[]3[]2[]1[]0[)(
zHzHzH
zhhzzhhzzhzhhzhzhzhzhzhzhhzH
k
kl
N
l
Nl
l
k
k zlNkhzHzHzzkhzH ][)( where,)(][)(1
0
Traditional Parallel Architecture• 2-fold parallel architecture
4(N/2) multiplicationsN/2-tap 4(N/2-1)+2 additions
Traditional Parallel FIR
L-parallel FIR filter of length N/L requires 1. L2 (N/L) multiplications, i.e. LN multiplications2. L2 (N/L -1) +L(L-1) additions, i.e. L(N-1) additions
~ LN multiply-add operations
Fast FIR Algorithm (FFA)• First by applying L-fold polyphase
decomposition for H(z)– There are L filters of length N/L
• By applying Winograd algorithm– 2 polynomials of degree (N/L)-1 can be
implemented by using 2 (N/L)-1 product terms.– Each product terms are equivalent to filtering
operations in the block formulation– Consequently, it can be realized using
approximately L FIR filters of length N/L It requires 2N-L multiplications
VSP Lecture4 - Fast Algorithms (cwliu@twins.ee.nctu.edu.tw)
FIR Using Polyphase Decomposition
68
VSP Lecture4 - Fast Algorithms (cwliu@twins.ee.nctu.edu.tw)
Traditional Parallel Architecture• 2-fold parallel architecture
4(N/2) multiplicationsN/2-tap 4(N/2-1)+2 additions
69
VSP Lecture4 - Fast Algorithms (cwliu@twins.ee.nctu.edu.tw)
2-Parallel Fast FIR Filter
70
VSP Lecture4 - Fast Algorithms (cwliu@twins.ee.nctu.edu.tw)
2-Parallel FFA• It requires 3 distinct sub-filters of length N/2
and 4 pre/post-processing additions. • Totally, it requires 1.5N multiplications and
3(N/2 -1)+4=1.5N +1 additions
71
top related