meeting 13 - eecs.umich.edu

46
Meeting 13 Summer 2009 Doing DSP Workshop Today: Admin comments. Decimation in time DFT. Other fast algorithms. An old friend. One graphic from TI materials. Learn all you can from the mistakes of others. You won’t have time to make them all yourself. — Alfred Sheinwold Doing DSP Workshop – Summer 2009 Meeting 13 – Page 1/46 Tuesday – June 16, 2009

Upload: others

Post on 10-May-2022

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Meeting 13 - eecs.umich.edu

Meeting 13

Summer 2009 Doing DSP Workshop

Today:

◮ Admin comments.

◮ Decimation in time DFT.

◮ Other fast algorithms.

◮ An old friend.

One graphic from TI materials.

Learn all you can from the mistakes of others. You won’t have time to make them

all yourself. — Alfred Sheinwold

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 1/46 Tuesday – June 16, 2009

Page 2: Meeting 13 - eecs.umich.edu

Projects

Audio waveform synthesizer –

sine, square wave, triangle, etc.

◮ Darin Rajabian

OFDM.

◮ Yu Wang

Motor speed control lab demon­

stration.

◮ Zharori Cong

◮ B.K. Kim

Remote camera using ZigBee.

◮ James Kim

◮ Jordan Adams

Digital Filter Study.

◮ Vindhya Reddy

◮ Joanna Widjaja

Ultrasonic Vision Aide.

◮ Ronald Deang

Not cast in concrete.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 2/46 Tuesday – June 16, 2009

Page 3: Meeting 13 - eecs.umich.edu

Suggested Project Phases

◮ Start up.

◮ Basically define the task, locate useful resources, and verbalize

a possible plan of attack.

◮ Initial Start.

◮ Develop the initial proposal. If applicable, do MATLAB

simulation. Identify required parts and other resources needed

to be purchased. Should have a reasonably clear understanding

of what is to be done and how. Set up goals and time line.

◮ Work in earnest.

◮ Program, build, debug. Repeat.

◮ Completion. Sometime in August.

◮ Demonstration to the workshop.◮ Poster.

Feel free to use Chih­Wei and myself as resources.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 3/46 Tuesday – June 16, 2009

Page 4: Meeting 13 - eecs.umich.edu

Updated tentative schedule

Week of June 15: Exercise 5, controlSTICK ADC, DAC, xfer meas..

Tuesday – Fast DFTs.

Thursday – Xilinx 8­bit PicoBlaze microcomputer (VHDL).

Week of June 22: Exercise 6, real­time FFT and waveform evaluation.

Tuesday – TBD. KM away.

Thursday – TBD. KM away.

Weeks following —

Lecture and lab complete, focus on projects.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 4/46 Tuesday – June 16, 2009

Page 5: Meeting 13 - eecs.umich.edu

Lab floor to be done this week

The floors in EECS 4341 are scheduled to be stripped and waxed this

week.

Except for the tables we got everything up off of the floor on Friday.

We have kept the lab functional by moving the computers onto the tables

to retain access.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 5/46 Tuesday – June 16, 2009

Page 6: Meeting 13 - eecs.umich.edu

Lab tables to be replaced next week.

Friday all of the computers will be shut down an placed in

temporary storage.

The current lab tables will be removed and be replaced with

“real” tables with built in shelving. This hopefully will be done

early in the week.

Once the new lab tables are in place the computers, scopes and

signal generators will be put onto the new benches. Hopefully

the lab will be fully operational by the end of the week.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 6/46 Tuesday – June 16, 2009

Page 7: Meeting 13 - eecs.umich.edu

Today

◮ Fast DFT algorithms.

◮ Some observations on the C28x FFT support.

◮ A useful C version of the FFT.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 7/46 Tuesday – June 16, 2009

Page 8: Meeting 13 - eecs.umich.edu

“The” Fast Fourier Transform (FFT) Algorithm

There are many fast algorithms (FFTs) that can be used to

compute the Discrete Fourier Transform (DFT). The DFT is

defined as

X[k] =

N−1∑

n=0

x[n]e−j2πkn/N , k = 0,1, . . . , N − 1.

The nominal computational cost is N2 complex MACs.

Any algorithm that significantly reduces this number can be

considered as being fast.

There are many fast algorithms. Some algorithms are faster

than others.

The metric by which to judge algorithms by is not always clear.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 8/46 Tuesday – June 16, 2009

Page 9: Meeting 13 - eecs.umich.edu

Many ways of computing the DFTThe paper An Algorithm

for the Machine Computa­

tion of Complex Fourier Se­

ries by Cooley and Tukey

in 1965 was the first “mod­

ern” (or should we say early

computer period?) publica­

tion of a fast algorithm for

computing the DFT. This

paper triggered the devel­

opment of a large number

of alternative procedures.

The FFT was first discov­

ered by Gauss in 1805. It

was used to calculate the

obit of an asteroid. Was

found in one of his work

books written in Latin. But

that’s another story.

Some DFT algorithms:

◮ brute force

◮ Singleton’s DFT speed upprocedure

◮ Goertzel algorithm

◮ decimation in time

◮ decimation in frequency

◮ other radix algorithms

◮ four◮ eight◮ split radix

◮ Winograd’s short lengthconvolution algorithm

◮ prime factor method(Good­Thomas)

◮ Winograd Fourier transformalgorithm (WFTA)

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 9/46 Tuesday – June 16, 2009

Page 10: Meeting 13 - eecs.umich.edu

Need and capability

Everything has its time.

Richard Garwin had a need (nuclear monitoring).

John Tukey had an idea how to solve it.

James Cooley coded it up and made it work.

Computers were just then coming into general use.

And, of course, Gauss did “it” first.

Good/Thomas published the Prime Factor Algorithm earlier.

The Chinese Remainder Theorem is very, very old.

Almost every implementor has a different view and shares it.

There are well over 3000 publications about the FFT.

More appear to being generated almost continuously.

Who knows how many publications that use/mention it.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 10/46 Tuesday – June 16, 2009

Page 11: Meeting 13 - eecs.umich.edu

Concepts important to fast DFT algorithms

Roots of unity, powers of WN = e−j2π/N .

Symmetry of the sine and cosine.

Index mappings.

Matrix Kronecker products.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 11/46 Tuesday – June 16, 2009

Page 12: Meeting 13 - eecs.umich.edu

Performance characterization constantly changes

Early effort largely minimized multiplication.

This evolved into minimizing the number of arithmetic operations.

Using today’s processors the goal is largely to minimize data movement.

Implementing an FFT on ASIC, arithmetic becomes important again.

Almost always can trade between memory and execution time.

How does one do a gigapoint FFT?

How to exploit parallelism?

Bit serial arithmetic versions exist.

N specific FFT code generators exist.

Can be pipelined.

Is there a lower bound on computational cost?

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 12/46 Tuesday – June 16, 2009

Page 13: Meeting 13 - eecs.umich.edu

The decimation-in-time radix-2 FFT

◮ N is assumed to be an integer power of 2.◮ Divide the x[n] into two N/2 value sets based even/odd

index values.◮ Form the DFT of each set and combine results to form N

value DFT.◮ Repeat the procedure on each of the N/2 values DFTs.◮ And so on.

The resulting nominal complex MAC count isN2 × log2(N).

N log2(N)N2 × log2(N) N2

64 6 192 4096

128 7 448 16384

256 8 1024 65536

512 9 2034 262144

1024 10 5120 1048576

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 13/46 Tuesday – June 16, 2009

Page 14: Meeting 13 - eecs.umich.edu

Separating the even and odd indexed samples

Start with the forward transform equation

X[k] =

N−1∑

n=0

x[n]e−j2πkn/N , k = 0,1, . . . , N − 1 .

Even numbers have the form 2p and odd numbers have the form 2q + 1

where p and q go from 0,1,2, . . . , N/2− 1.

X[k] =

N/2−1∑

p=0

x[2p]e−j2πk2p/N+

N/2−1∑

q=0

x[2q + 1]e−j2πk(2q+1)/N

=

N/2−1∑

p=0

x[2p]e−j2πkp/(N/2) + e−j2πk/NN/2−1∑

q=0

x[2q + 1]e−j2πkq/(N/2) .

We now have a weighted sum of two N/2 value DFTs. Repeat the process.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 14/46 Tuesday – June 16, 2009

Page 15: Meeting 13 - eecs.umich.edu

The signal flow graph

+

+

+

+

+

+

+

+

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

W 0

N

W 1N

W 2N

W 3

N

W 5

N

W 4N

W 6

N

W 7N

Xe[0]

Xe[1]

Xe[2]

Xe[3]

Xo[0]

Xo[1]

Xo[2]

Xo[3]

x[0]

x[2]

x[6]

x[4]

x[1]

x[3]

x[5]

x[7]

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 15/46 Tuesday – June 16, 2009

Page 16: Meeting 13 - eecs.umich.edu

Exploiting symmetry

+

+

+

+

+

+

+

+

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

W 0

N

W 1N

W 2N

W 3

N

Xe[0]

Xe[1]

Xe[2]

Xe[3]

Xo[0]

Xo[1]

Xo[2]

Xo[3]

x[0]

x[2]

x[4]

x[6]

x[1]

x[3]

x[5]

x[7]

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 16/46 Tuesday – June 16, 2009

Page 17: Meeting 13 - eecs.umich.edu

Repeat until done

+

+

+

+

+

+

+

+

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

W 0

8

W 1

8

W 2

8

W 3

8

x[0]

x[2]

x[4]

x[6]

x[1]

x[3]

x[5]

x[7]

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

W 0

8

W 2

8

W 2

8

W 0

8

− −

−−

W 0

8

W 0

8

W 0

8

W 0

8

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 17/46 Tuesday – June 16, 2009

Page 18: Meeting 13 - eecs.umich.edu

Butterflies and bit reverse addresses

If one can do two butterflies simul­

taneously then an algorithm exists

that allows in/out normal ordering

and in­place computation.

normal bit reverse

000 000

001 100

010 010

011 110

100 001

101 101

110 011

111 111From the Wikepedia.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 18/46 Tuesday – June 16, 2009

Page 19: Meeting 13 - eecs.umich.edu

Pseudo code

Can organize using three loops. One each for level or layer, group,

butterfly.

nFFTs = N/2; FFTsize = 2;for(r = 0; r < R; r++) {

for(fft = 0; fft < nFFTs; fft++) {for(butterfly = 0; butterfly < (FFTsize/2); butterfly++) {

top_index = fft*FFTsize+butterfly;bot_index = top_index+(FFTsize/2);w_index = butterfly*nFFTs;temp = W[w_index]*data[bot_index];data[bot_index] = data[top_index]-temp; // update bot first!data[top_index] = data[top_index]+temp; // now update top

}}nFFTs = (nFFTs/2); FFTsize = (FFTsize*2);

}

The input values assumed to have been reordered.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 19/46 Tuesday – June 16, 2009

Page 20: Meeting 13 - eecs.umich.edu

Perhaps speeding up the indexing

nFFTs = N/2; FFTsize = 2;for(r = 0; r < R; r++) {

FFTstart = 0;for(fft = 0; fft < nFFTs; fft++) {

w_index = 0;for(butterfly = 0; butterfly < (FFTsize/2); butterfly++) {

top_index = FFTstart+butterfly;bot_index = top_index+(FFTsize/2);temp = W[w_index]*data[bot_index];data[bot_index] = data[top_index]-temp; // update bot first!data[top_index] = data[top_index]+temp; // now update topw_index = w_index+nFFTs;

}FFTstart = FFTstart+FFTsize;

}nFFTs = (nFFTs>>1); FFTsize = (FFTsize<<1); // shifts are easy to do

}

The input values assumed to have been reordered.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 20/46 Tuesday – June 16, 2009

Page 21: Meeting 13 - eecs.umich.edu

Reordering the input

+

+

+

+

+

+

+

+

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

W 0

8

W 1

8

W 2

8

W 3

8

x[0]

x[2]

x[4]

x[6]

x[1]

x[3]

x[5]

x[7]

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

W 0

4

W 1

4

W 1

4

W 0

4

− −

−−

W 0

2

W 0

2

W 0

2

W 0

2

0-4

0+4

2+6

2-6

1+5

1-5

3+7

3-7

0+4

1+5

2+6

3+7

0-4

1-5

3-7

2-6

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 21/46 Tuesday – June 16, 2009

Page 22: Meeting 13 - eecs.umich.edu

Continuing the reordering

+

+

+

+

+

+

+

+

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

W 08

W 18

W 28

W 38

x[0]

x[2]

x[4]

x[6]

x[1]

x[3]

x[5]

x[7]

+

+

+

+

+

+

+

+

W 04

W 14

W 14

W 04

−−

W 02

W 02

W 02

W 02

a

b

c

d

e

f

g

h

a

b

d

c

e

f

g

h+

+

+

+

+

+

+

+

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 22/46 Tuesday – June 16, 2009

Page 23: Meeting 13 - eecs.umich.edu

Reordered radix-8 DIT

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

W 0

8

W 1

8

W 2

8

W 3

8

x[0]

x[2]

x[4]

x[6]

x[1]

x[3]

x[5]

x[7]

+

+

+

+

+

+

+

+

W 0

8

W 2

8

W 2

8

W 0

8

−−

W 0

8

W 0

8

W 0

8

W 0

8+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 23/46 Tuesday – June 16, 2009

Page 24: Meeting 13 - eecs.umich.edu

Can start going the other way

+

+

+

+

+

+

+

+

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

W 0

N

W 1N

W 2N

W 3

N

x[0]

x[2]

x[6]

x[4]

x[1]

x[3]

x[5]

x[7]

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 24/46 Tuesday – June 16, 2009

Page 25: Meeting 13 - eecs.umich.edu

The radix-8 DIF FFT

X[0]

X[1]

X[2]

X[3]

X[4]

X[5]

X[6]

X[7]

W 0

8

W 1

8

W 2

8

W 3

8

x[0]

x[2]

x[4]

x[6]

x[1]

x[3]

x[5]

x[7]

+

+

+

+

+

+

+

+

W 0

8

W 2

8

W 2

8

W 0

8

−−

W 0

8

W 0

8

W 0

8

W 0

8+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

+

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 25/46 Tuesday – June 16, 2009

Page 26: Meeting 13 - eecs.umich.edu

The flood gate was opened

Basically we exploited the way we wrote the indices of the values being

transformed and ended up with a fast algorithm.

We also got a “new” algorithm by manipulating the signal flow graph.

There are lots of ways to write indices and lots of ways to reorder the

data flow.

Which is “best”? What is meant by best?

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 26/46 Tuesday – June 16, 2009

Page 27: Meeting 13 - eecs.umich.edu

How many ways are there to index?

Numbers can be written as polynomials. For example we can

write

123410 = 1× 103+ 2× 102

+ 3× 103+ 4× 100.

We refer to 10 as being the radix.

N =

D−1∑

k=0

dkrk

Similarly we can write numbers in binary form as

123410 = 1× 210+ 0× 29

+ 0× 28+ 1× 27

+ 1× 26+ 0× 25

+ 1× 24+ 0× 23

+ 0× 22+ 1× 21

+ 0× 20,

= 100110100102.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 27/46 Tuesday – June 16, 2009

Page 28: Meeting 13 - eecs.umich.edu

A simple factoring of N

Numbers can also be written as the product of their factors.

For example

1234 = 2× 617.

Consider the number N = N1N2 where N1 and N2 are relatively

prime. It can be shown that we can uniquely write the integer

values from 0 through N − 1 as

n = n2N1 +n1, n1 = 0,1, . . . , N1 − 1, n2 = 0,1, . . . , N2 − 1

or alternatively as

k = k1N2 + k2, k1 = 0,1, . . . , N1 − 1, k2 = 0,1, . . . , N2 − 1.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 28/46 Tuesday – June 16, 2009

Page 29: Meeting 13 - eecs.umich.edu

FFT based only on simple factoring

X[k] =

N−1∑

n=0

x[n]e−j2πkn/N ,

X[k1N2 +n2] =

N1−1∑

n1=0

N2−1∑

n2=0

x[n2N1 +n1]e−j2π(k1N2+k2)(n2N1+n1)/(N1N2)

=

N1−1∑

n1=0

N2−1∑

n2=0

x[n2N1 +n1]e−j2π(k1n1N2+k2n2N1+k2n1)/(N1N2)

=

N1−1∑

n1=0

e−j2πk1n1/N1

e−j2πk2n1/N

N2−1∑

n2=0

x[n2N1 +n1]e−j2πk2n2/N2

.

Procedure: Form N1 N2­point DFTs.

Weight the results using twiddle­factors.

Form N2 N1­point DFTs.

N1N22 +N1N2 +N2N

21 = N1N2(N1 + 1+N2)

For N = 15 = 3× 5 compare N2= 225 to N1N2(N1 + 1+N2) = 135.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 29/46 Tuesday – June 16, 2009

Page 30: Meeting 13 - eecs.umich.edu

Prime Factor Algorithm index mapping

A more generalized mapping of the indices is

n = ((K1n1 +K2n2))N where 0 ≤ n1 < N1

0 ≤ n2 < N2

and

k = ((K3k1 +K4k2))N where 0 ≤ k1 < N1

0 ≤ k2 < N2.

The ( )N denotes using the quantity contained in the parentheses

modulo­N .

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 30/46 Tuesday – June 16, 2009

Page 31: Meeting 13 - eecs.umich.edu

Prime factor decomposition

((kn))N = ((K1K3n1k1 +K1K4n1k2 +K2K3n2k1 +K2K4n2k2))N

If values of K1, K2, K3, and K4 can be determined such that

((K1K4))N = ((K2K3))N = 0

then the DFT becomes

X[k] =

N1−1∑

n1=0

e−j2πk1n1K1K3/N

N2−1∑

n2=0

x[n1, n2]e−j2πk2n2K2K4/N .

Both the condition for generating 1­to­1 index maps and the above

modulo relationship can be satisfied. The result is a mapping of a

one­dimensional DFT into a two­dimensional DFT. For this case the

number of complex multiplications is

N2N21 +N1N

22 .

For N = 15 = 3× 5 this gives 120 complex multiplications.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 31/46 Tuesday – June 16, 2009

Page 32: Meeting 13 - eecs.umich.edu

PFA cost in terms of multiplications

For Nf relatively prime factors of N the number of complex

multiplications become

N

Nf−1∑

i=0

Ni.

For N = 8184 = 3× 8× 11× 31 the number of complex

multiplications is 53× 8184 as compared to the unmodified

DFT which uses 8184× 8184.

The PFA uses a factor of about 154 fewer.

If it were possible to use a 8192 value transform instead, a DIT

FFT would nominally use (8192/2)× 13 complex

multiplications. This is a factor of about 1260 fewer

multiplications than needed by the unmodified DFT definition.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 32/46 Tuesday – June 16, 2009

Page 33: Meeting 13 - eecs.umich.edu

C28x FFT and related functions

Started out with sprc081.zip and eventually end up in

c:\tidcs\c28\dsp_tbox. Moved to lab 6 directory.

Documented in FFT Library Module user’s Guide C28xFoundation

Software contained in fft_mdl.pdf. Essential reading.

3322--bbiitt RReeaall FFFFTT

EExxeeccuuttiioonn CCyycclleessFFFFTT ssiizzee

CCaassee 11 :: TTFF((QQ3311)) CCaassee 22 :: TTFF((QQ3300)) CCaassee 33 :: TTFF((QQ3300)) && OOTTPP

128 6509 6763 7017

256 14756 15394 16032

512 33081 34615 36149

1024 73422 77004 80536

3322--bbiitt CCoommpplleexx FFFFTT

128 11159 11671 12183

256 25901 27181 28461

512 59075 62146 65217

1024 132823 139991 147159

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 33/46 Tuesday – June 16, 2009

Page 34: Meeting 13 - eecs.umich.edu

Time and storage

1024 real using 60 MHz clock takes about 1.2 ms.

32­bit 1024 real requres 2048 16­bit words for data.

1024 complex using 60 MHz clock takes about 2.2 ms.

32­bit 1024 complex requires 4096 16­bit words for data.

The total RAM on the C28017 controlSTICK is 6K 16­bit words.

TI functions use DIT with input in bit­reverse addressing form.

The TI functions do not scale as part of the transform.

TI does not provide an inverse FFT for the C28x.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 34/46 Tuesday – June 16, 2009

Page 35: Meeting 13 - eecs.umich.edu

FFT input scaling

Consider a solitary sinusoidal input where B­bit sample values

are placed into the low bits:

cos(2πfct) =ej2πfct + e−j2πfct

2

For an N­value DFT the gain at the fc frequency (assuming it

matches an analysis frequency) is N. If a 1024 point transform

is taken then the result might require 10+B­1 bits.

Using C28017’s 12­bit samples the maximum amplitude FFT

value is 2047× 1024/2 = 1,048064. This will fit in 21­bits.

Actually, max complex DC in is the worst case input waveform.

Another “bad” waveform is a complex max amplitude square

wave. The fundamental has amplitude 2/π instead of 1/2.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 35/46 Tuesday – June 16, 2009

Page 36: Meeting 13 - eecs.umich.edu

Scaling when taking the IDFT

For the 12­bit single sine wave using the DFT to compute the

IDFT will increase the word size by 10 bits. The result will fit

using a 32­bit word size. Simply transform then shift right by

10 bits.

Do we need 21 bits or 12 bits for the result? We started with 12.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 36/46 Tuesday – June 16, 2009

Page 37: Meeting 13 - eecs.umich.edu

What to do if there aren’t enough bits?

On a fixed point DSP computer floating point is not normally an

option. When simulating floating point, performance takes a big

hit.

One could scale the partial results by a factor of 2 for each FFT

layer. This commonly done. It is conservative often scaling

values more than necessary costing in noise performance.

A hybrid fixed point ­ floating point technique termed block

floating point is often a viable option.

One can find code examples on the web both for TI and

Motorola DSP devices for both scaling procedures. The block

point scaling is well supported in the Motorola DSP devices.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 37/46 Tuesday – June 16, 2009

Page 38: Meeting 13 - eecs.umich.edu

Block floating point

I can write values as m× 2c where m is a two’s complement

fraction having magnitude less than one and c is a two’s complement

integer.

0.25 can be written as 0.5× 2−1 or as 0.0625× 22

16 can be written as 0.5× 25 or as 0.03125× 29

If we have a array of values all using the same value of c we have

a set of values referred to as being in block floating point form.

In order to keep values fractional they must be scaled such that

the magnitude of the largest value is less than 1.

FFTs formed using block floating point are generally more accurate than

fixed point FFTs and less accurate than equivalent floating point ones.

The DSP56303 has hardware support that allows block floating

point FFTs to be formed only slightly slower that fixed point ones.

I don’t know how well the C5510 supports block floating point.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 38/46 Tuesday – June 16, 2009

Page 39: Meeting 13 - eecs.umich.edu

Singleton’s DFT speed-up procedure

In 1969 Singleton published a simple algorithm that reduces the

number of multiply operations for DFT’s by a factor of four.

Z[k] =

N−1∑

n=0

W knN z[n]

=

N−1∑

n=0

ze[n] cos(2πkn/N)− jzo[n] sin(2πkn/N)

Write Z[k] in terms of even and odd parts as Z[k] = Ze[k]+ Zo[k].

Ze[k] =

N−1∑

n=0

ze[n] cos(2πkn/N)

Zo[k] = −j

N−1∑

n=0

zo[n] sin(2πkn/N)

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 39/46 Tuesday – June 16, 2009

Page 40: Meeting 13 - eecs.umich.edu

Singleton’s procedure continued

For N odd we can write

Ze[k] = z[0]+

(N−1)/2∑

n=1

(z[n]+ z[N −n]) cos(2πkn/N),

Zo[k] = −j

(N−1)/2∑

n=1

(z[n]− z[N −n]) sin(2πkn/N).

Because of symmetry the values of Ze[k] and Zo[k] need only be

computed for 0 ≤ k ≤ (N − 1)/2. Note that there were pairs of two’s

that cancelled out.

The above sums can be evaluated using

multiplies = 2N − 1

2

N − 1

2+ 2

N − 1

2

N − 1

2= (N − 1)2.

Depending on whether or not there are two ALUs and how they are

arranged the multiplication of complex values by real values in a

single instruction time may be possible. This would result in the

reduction of the number of multiplies by an another factor of two.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 40/46 Tuesday – June 16, 2009

Page 41: Meeting 13 - eecs.umich.edu

Singleton’s procedure completed

The even N case is left as an exercise (not assigned).

There is going to be a pass through the data to compute the

even and parts at the start of the procedure and a similar pass

at the end. This will add some additional overhead.

Depending on how the particular DSP Architecture we are using

does things, a speed up of perhaps as much of 4 to 8 times may

be possible over brute force.

This works even if N is prime!

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 41/46 Tuesday – June 16, 2009

Page 42: Meeting 13 - eecs.umich.edu

Is all sweetness and light?

Of course, not all is sweetness and light. There are many worries

associated with efficiently computing DFTs. Some of these are:

◮ It is not always possible to compute a DFT in­place. Quite

often it is necessary to swing between a pair of working

areas as one moves between layers.◮ Does there exist code or at least an algorithm for efficiently

computing DFTs for the prime factors? There is always the

possibility of Singleton’s procedure at least dangling the

prospect of a four times speed up. However, better

speedups may be possible.◮ The transformed values generally need to be reordered. The

use of permutation arrays are useful but these too consume

memory resources.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 42/46 Tuesday – June 16, 2009

Page 43: Meeting 13 - eecs.umich.edu

Three multiplier complex multiplication

In general (a+ jb)× (c + jd) = (ac − bd)+ j(bc + ad).

This can be written using three multiplications as

(a+jb)×(c+jd) = a(c−d)+(a−b)d+j[b(c+d)+(a−b)d] .

~

ÅJÇ

ÅHÇ

Ä

Ç~ÅJÄÇ

ÄÅH~Ç

When multiplying by a constant, c + jd, the c +d and c − d can be

table lookup.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 43/46 Tuesday – June 16, 2009

Page 44: Meeting 13 - eecs.umich.edu

C FFT — An old friend!

/* Fast Fourier Transform Function (fft2)

Adapted from:

The Fast Fourier Transform and its ApplicationsJ. W. Cooley, P. A. Lewis, and P. D. WelchIEEE Transactions on Education, Vol. 12, No. 1,March 1969, pp 27-34.

28Feb87 Converted to C .. K. Metzger06Feb91 High-C conversion .. K.Metzger

Function forms the discrete Fourier transform of an arrayof double precision complex values. An integer power oftwo number of values is assumed to be contained in a hugearray.

void fft2(data, log2n, direction)

data huge pointer to double precision complex valueddata stored re,im,re,im,...

log2n int log base 2 of number of points totransform. Allowed range is 1 thru NLIMIT.

direction int which is - if going from time to frequency(uses -sine and divides values by number ofcomplex values). If >=0 goes from frequency totime.

*/

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 44/46 Tuesday – June 16, 2009

Page 45: Meeting 13 - eecs.umich.edu

Bit reverse reorder the input

void fft2(double *data, int log2n, int direction){

unsigned n, i, j, el, le, le_half, to_freq;register unsigned val_i, rev_i;double *ptr1, *ptr2, temp, dbl_n, arg, t_re, t_im, u_re, u_im, w_re, w_im;

if (pi==0.0) pi=4.0*atan(1.0);to_freq=(direction<0) ? 1 : 0;dbl_n=(double)(n=1<<log2n);for (i=1; i<n-1; i++) {

val_i=i; rev_i=0;for (j=0; j<(unsigned)log2n; j++) {

rev_i=(rev_i<<1)|(val_i&0x0001);val_i>>=1;

}if (rev_i>i) {

temp= *(ptr1=data+(i<<1));

*ptr1= *(ptr2=data+(rev_i<<1));

*ptr2++=temp;temp= *(++ptr1);

*ptr1= *ptr2;

*ptr2=temp;}

}

The C5510 has hardware to (hopefully) simplify this task.

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 45/46 Tuesday – June 16, 2009

Page 46: Meeting 13 - eecs.umich.edu

Compute the FFT and maybe normalize

for (el=0; el<(unsigned)log2n; el++) {le=(le_half=1<<el)<<1;u_re=1.0; u_im=0.0;w_re=cos(arg=pi/le_half);w_im=(to_freq) ? -sin(arg) : sin(arg);for (j=0; j<le_half; j++) {

for (i=j; i<n; i+=le) {ptr2=(ptr1=data+((i+le_half)<<1))+1;t_re= *ptr1*u_re-*ptr2*u_im;t_im= *ptr1*u_im+*ptr2*u_re;ptr2=data+(i<<1);

*ptr1++= *ptr2++-t_re;

*ptr1= *ptr2-t_im;

*ptr2--+=t_im;

*ptr2+=t_re;}t_re=u_re;u_re=u_re*w_re-u_im*w_im;u_im=t_re*w_im+u_im*w_re;

}}if (to_freq) {

for (i=0; i<n; i++) {

*data++/=dbl_n;

*data++/=dbl_n;}

}return;

}

Doing DSP Workshop – Summer 2009 Meeting 13 – Page 46/46 Tuesday – June 16, 2009