dsp kit signal processing

Upload: amirshafiq123

Post on 07-Feb-2018

233 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/21/2019 DSP kit Signal processing

    1/18

    Application ReportSPRA947A June 2009

    1

    Signal Processing Examples Using the TMS320C67xDigital Signal Processing Library (DSPLIB)

    Anuj Dharia & Rosham Gummattira TMS320C6000 Software Applications

    ABSTRACT

    The TMS320C67xdigital signal processing library (DSPLIB) provides a set of C-callable,assembly-optimized functions commonly used in signal processing applications, e.g.,filtering and transform. The DSPLIB includes several functions for each processing category,based on the input parameter conditions, to provide parameter-specific optimal performance.Therefore, it is important to understand the differences and requirements of the functions ineach category. This application report presents the usage and performance of three keysignal processing categories, i.e., finite impulse response (FIR), bi-quadratic infinite impulse

    response (IIR), and fast Fourier transform (FFT), to help users better utilize DSPLIB in theirsystem development.

    Contents

    1 Introduction 2. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    2 Benchmarking 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.1 Emulation/Simulation Setup 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.2 Cycle Count Measurement 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .2.3 Example Scenario and Expected Performance 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    3 Examples 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.1 Finite Impulse Response (FIR) Filter 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    3.1.1 DSPF_sp_fir_gen Single Precision Generic FIR filter 7. . . . . . . . . . . . . . . . . . . . . . . . . .3.1.2 DSPF_sp_fir_r2 Single Precision Radix 2 FIR filter 7. . . . . . . . . . . . . . . . . . . . . . . . . . .3.1.3 DSPF_sp_fir_cplx Single Precision Complex Radix 2 FIR filter 7. . . . . . . . . . . . . . . . .3.1.4 FIR Example 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    3.2 Infinite Impulse Response (IIR) Filter 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.2.1 DSPF_sp_biquad Single Precision Bi-quadratic IIR filter 10. . . . . . . . . . . . . . . . . . . . . .3.2.2 IIR Filter Example 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    3.3 Fast Fourier Transform (FFT) 12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.3.1 DSPF_sp_cfftr2_dit Complex Radix 2 FFT using Decimation-In-Time 13. . . . . . . . . .3.3.2 DSPF_sp_cfftr4_dif Complex Radix 4 FFT using Decimation-In-Frequency 13. . . . .

    3.3.3 DSPF_sp_fftSPxSP Cache Optimized Mixed Radix FFT with digit reversal 13. . . . . .3.3.4 FFT Example 15. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .3.3.5 Performance Analysis 16. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    4 References 17. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Trademarks are the property of their respective owners.

  • 7/21/2019 DSP kit Signal processing

    2/18

    SPRA947A

    2 Signal Processing Examples Using the TMS320C67x Digital Signal Processing Library (DSPLIB)

    List of Figures

    Figure 1 Memory Hierarchy and Potential Overhead 3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Figure 2 C6713 Linker Command File 5. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Figure 3 Frequency Response of a Low-pass FIR Filter 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Figure 4 FIR Filter Input 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Figure 5 FIR Filter Output 9. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Figure 6 Frequency Response of Low-Pass IIR Filter 11. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Figure 7 IIR Filter Input 11. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Figure 8 IIR Filter Output 12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Figure 9 512-Point FFT Input 16. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Figure 10 Magnitude of the Output 16. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    List of Tables

    Table 1 C6713 DSK Key Features 4. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Table 2 Stall Cycles related to L1D 6. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .Table 3 FIR Filter Design Specifications 8. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Table 4 FIR Filter Benchmarks (240 Kernel Coefficients (nh) and 200 Output Samples (nr)) 9. . . .

    Table 5 IIR Filter Design Specifications 10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Table 6 IIR Filter Benchmarks (200 output samples (nx)) 12. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Table 7 Single-Pass FFT Benchmarks for 1024 Point input 16. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    Table 8 Single-Pass vs. Multi-Pass FFT Benchmarks 17. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

    1 Introduction

    TMS320C67x is an advanced very long instruction word (VLIW) processor well suited forreal-time signal processing applications with its high computing power and large on-chipmemory. It also provides enhanced direct memory access (EDMA) and cache to efficientlytransfer data to/from off-chip memory/device. To help users shorten the time-to-market in systemdevelopment, we provide a set of assembly-optimized functions named digital signal processinglibrary (DSPLIB). Each function in the DSPLIB is designed to provide the best performancepossible by optimally utilizing available resources and avoiding potential resource conflicts.

    The DSPLIB includes several functions for each processing category, based on the inputparameter conditions to provide parameter-specific optimal performance. When users utilizeDSPLIB, therefore, it is important to understand the differences and requirements of thefunctions in each category.

    It is also important to understand potential overhead related to memory hierarchy to estimateand improve the actual performance of a system being developed, Figure 1 shows the memoryhierarchy of C67xand related potential overhead. For example, when new code needs to befetched and/or the whole program does not fit the level-one program cache (L1P), L1P cachemisses can occur, stalling the CPU until the required code is fetched. Similarly, when the wholedata do not fit the level-one data cache (L1D) and/or a new set of data needs to be transferredto/from off-chip memory/device, L1D cache misses stall the CPU. All L1P and L1D misses areserviced by the level-two cache/SRAM (L2 cache/SRAM).

  • 7/21/2019 DSP kit Signal processing

    3/18

    SPRA947A

    3Signal Processing Examples Using the TMS320C67x Digital Signal Processing Library (DSPLIB)

    C67x

    CPU

    L1P L1D

    L2 Cache/SRAM

    DMA/EMIF Off-chip memory

    1 2

    3

    4

    1

    2

    3

    4

    L1 program cache misses

    L1 data cache misses

    L2 cache misses

    Off-chip memory accesses

    Figure 1. Memory Hierarchy and Potential Overhead

    Similar to the L1D misses, L2 cache misses occur if the whole code and data do not fit the L2cache and/or a new set of data needs to be transferred to/from off-chip memory/device. The L2miss overhead can be significant compared to the L1P/L1D miss overhead because the L2cache needs to communicate with slow off-chip memory/device.

    L2 SRAM can also be used to service L1D/L1P misses with DMA to transfer code/data betweenL2 SRAM and off-chip memory/device. The data transfer with DMA is typically more effectivethan that with L2 cache due to its nature of longer burst transactions, thereby reducing memoryaccess latency overhead. However, the DMA transfer can involve more programming effortbecause data transfers and synchronization have to be manually managed. TMS320C67xprovides both cache and DMA mechanisms to allow users to choose a right mechanismdepending on situations.

    This application note presents the usage and performance of three key categories, i.e., finiteimpulse response (FIR) filter, infinite impulse response (IIR) filter and fast Fourier transform(FFT), to help TMS320C67x users utilize DSPLIB in system development.

    2 Benchmarking

    2.1 Emulation/Simulation Setup

    A TMS320C6713 DSP starter kit (DSK) is used in this application report to measure cyclecounts. Table 1 lists key features of the C6713 DSK, which are important factors in the

    performance analysis and optimization. More details on the C6713 internal memory structureand operations can be found in the TMS320C621x/C671x DSP Two-Level Internal MemoryReference Guide (SPRU609).

  • 7/21/2019 DSP kit Signal processing

    4/18

    SPRA947A

    4 Signal Processing Examples Using the TMS320C67x Digital Signal Processing Library (DSPLIB)

    Table 1. C6713 DSK Key Features

    Item Description

    Clock frequency 225 MHz

    L1P 4-kbyte, direct-mapped, 64-byte cache line

    L1D 4-kbyte, 2-way set associative, 32-byte cache line, 64-bit wide dual-ported

    L2 SRAM 5-cycle L1P miss penalty, 4-cycle L1D miss penalty, up to 64 Kbytes, four 64-bit

    banks

    L2 cache 5-cycle L1P miss penalty, 4-cycle L1D miss penalty, up to 64 Kbytes, 1/2/3/4-way

    set associative, 128-byte cache line, four 64-bit banks

    L2 to L1D read path 128 bits

    L1D to L2 write buffer 32-bit, 4-entry, L2 can process a write request every 2 cycles

    EMIF 32-bit bus

    The C6713 DSK is connected to a PC using a USB A/B connector cable. If you use simulation,select C67xx Cycle Accurate Simulator. The cycle counts obtained from simulation might notbe accurate because the simulator ignores L1/L2 cache misses and off-chip memory accesses.

    Software version numbers used in this application report are as follows:

    Code Composer Studio version 2.2

    C67x DSPLIB version 1.0

    2.2 Cycle Count Measurement

    The built-in timer in C6713 is used to measure cycle counts for DSPLIB examples. The followingsample code shows how to set up the timer and measure the cycle counts with the Chip SupportLibrary (CSL).

    hTimer = TIMER_open(TIMER_DEVANY,0); /* open a timer */

    /**/

    /* Configure the timer. 1 count corresponds to 4 CPU cycles in C67 */

    /**/

    /* control period initial value */

    TIMER_configArgs(hTimer, 0x000002C0, 0xFFFFFFFF, 0x00000000);

    /* */

    /* Compute the overhead of calling the timer. */

    /* */

    start = TIMER_getCount(hTimer); /* to remove L1P miss overhead */

    start = TIMER_getCount(hTimer); /* get count */

    stop = TIMER_getCount(hTimer); /* get count */

  • 7/21/2019 DSP kit Signal processing

    5/18

    SPRA947A

    5Signal Processing Examples Using the TMS320C67x Digital Signal Processing Library (DSPLIB)

    overhead = stop start;

    start = TIMER_getCount(hTimer); /* get count */

    /* */

    /* Call a function here. */

    /* */

    diff = (TIMER_getCount(hTimer) start) overhead; /* get count */

    TIMER_close(hTimer);

    printf(%d cycles \n, diff*4);

    The maximum resolution of the timer is 4 CPU cycles since the input to the timer is fixed to theCPU clock, divided by four. The function call overhead for the TIMER_getCount() is roughlymeasured and compensated. Additional information on the timer registers can be found inTMS320C6000 Peripherals Reference Guide(SPRU190).

    2.3 Example Scenario and Expected Performance

    To analyze the potential overhead related to the memory hierarchy, data is stored in the L2SRAM of the chip. Figure 2 shows a linker command file used to store data on chip.

    MEMORY

    {

    L2SRAM: o = 00000000h l = 00010000h /* 64 kbytes */

    }

    SECTIONS

    {

    .cinit > L2SRAM

    .text > L2SRAM

    .stack > L2SRAM

    .bss > L2SRAM

    .const > L2SRAM

    .data > L2SRAM

    .far > L2SRAM

    .switch > L2SRAM

    .sysmem > L2SRAM

    .tables > L2SRAM

    .cio > L2SRAM}

    Figure 2. C6713 Linker Command File

    The overhead with off-chip memory accesses is not presented in this report because theoverhead can be minimized by overlapping the data transfer time and the computations timewith the EDMA.

  • 7/21/2019 DSP kit Signal processing

    6/18

    SPRA947A

    6 Signal Processing Examples Using the TMS320C67x Digital Signal Processing Library (DSPLIB)

    Additionally, a scenario in which data are already in L1D is not presented in this report becausethese cycle counts are very close to the formula cycle counts listed in the TMS320C67x DSPLibrary Programmers Reference(SPRU657).

    When data in on chip in the L2 SRAM, L1D miss overhead needs to be considered. Table 2 listsexpected stall cycles related to L1D read and/or write transactions. When there are read

    transactions only, the number of stall cycles is the number of L1D read misses times the L1Dmiss penalty (i.e. 4 cycles). In case of write transactions only, there is no stall unless the writebuffer is full. The write buffer is 32-bit wide, and allows up to four outstanding misses.

    When there are both read and write transactions, the L1D read miss penalty can increasebecause, to maintain data coherency, the write buffer is flushed before a read miss is serviced.

    Table 2. Stall Cycles related to L1D

    Transaction Number of stall cycles

    Read transaction only Number of L1D read misses * L1D miss penalty.

    Write transaction only No stall cycle unless the write buffer is full.

    Read and write transactions Number of L1D read misses * (L1D miss penalty + additional cycles for

    write buffer flush)

    3 Examples

    This section presents the usage and performance of three key signal processing categories, i.e.,finite impulse response (FIR) filter, infinite impulse response (IIR) filter and fast Fourier transform(FFT). To minimize the variation in cycle count measurement, be sure to select the Resetmenu(under Debugin Code Composer Studio) before running an example.

    3.1 Finite Impulse Response (FIR) Filter

    A generalized FIR filter of N filter coefficients, h(k), is defined as:

    y(n) N1

    k0

    h(k)x(n k)

    The C67x DSPLIB provides 3 FIR functions:

    DSPF_sp_fir_gen

    DSPF_sp_fir_r2

    DSPF_sp_fir_cplx

    The prototype and requirements of the FIR functions follow.

    3.1.1 DSPF_sp_fir_gen Single Precision Generic FIR filtervoid DSPF_sp_fir_gen(const float * restrict x, const float * restrict h, float* restrict r, int nh, int nr)

    x points to a floating point array of length nr+nh1 which holds the input samples.

    h points to a floating point array of length nh which holds the coefficients. The coefficientsneed to be placed in h in reverse order.

    r points to a floating point array of length nr which holds the outputs.

  • 7/21/2019 DSP kit Signal processing

    7/18

    SPRA947A

    7Signal Processing Examples Using the TMS320C67x Digital Signal Processing Library (DSPLIB)

    nh is the number of coefficients. nh must be greater than or equal to 4.

    nris the number of outputs. nr must be greater than or equal to 4.

    3.1.2 DSPF_sp_fir_r2 Single Precision Radix 2 FIR filter

    void DSPF_sp_fir_r2(const float *restrict x, const float *restrict h, float*restrict r, int nh, int nr)

    x points to a floating point array of length nr+nh1 which holds the input samples. x must bedouble word aligned and padded with 4 extra words at the end.

    h points to a floating point array of length nh which holds the coefficients. The coefficientsneed to be placed in h in reverse order. h must be double word aligned and padded with 4extra words at the end.

    r points to a floating point array of length nr which holds the outputs.

    nh is the number of coefficients. nh must be a multiple of 2 and greater than or equal to 8.

    nr is the number of outputs. nr must be a multiple of 2 and greater than or equal to 2.

    3.1.3 DSPF_sp_fir_cplx Single Precision Complex Radix 2 FIR filter

    void DSPF_sp_fir_cplx(const float *restrict x, const float *restrict h, float*restrict r, int nh, int nr)

    xpoints to a floating point array of length 2*(nr+nh1) which holds the input samples. x mustbe double word aligned and point to the 2*(nh1)th element (&x[2*nh1]).

    h points to a floating point array of length 2*nh which holds the coefficients. The coefficientsneed to be placed in h in normal order. h must be double word aligned.

    r points to a floating point array of length 2*nr which holds the outputs.

    nh is the number of coefficients. nh must be greater than or equal to 5.

    nr is the number of outputs. nr must be a multiple of 2 and greater than or equal to 2.

    3.1.4 FIR Example

    This example demonstrates the use of the C67x DSPLIB FIR filtering capabilities. First, filtercoefficients are generated in Matlab using the Filter Design and Analysis Tool (Matlab command:fdatool) with filter specifications listed in Table 3. The frequency response of this FIR filter isshown in Figure 3.

    Table 3. FIR Filter Design Specifications

    Filter Type Low-pass

    Order 239 (240 for DSP_fir_sym)

    Design Method Window (Kaiser with a beta of 0.5)

    Sampling frequency 44,100 Hz

    Cut-off frequency 10,000 Hz

  • 7/21/2019 DSP kit Signal processing

    8/18

    SPRA947A

    8 Signal Processing Examples Using the TMS320C67x Digital Signal Processing Library (DSPLIB)

    Figure 3. Frequency Response of a Low-pass FIR Filter

    Input data to the FIR filter are generated in floating point format as follows:

    x[i] = (sin(2 * PI * F1 * i / Fs) + sin(2 * PI * F2 * i / Fs));

    Where F1 and F2 are the two input data frequencies and Fs is the sampling frequency (44,100Hz). Figure 4 and Figure 5 show the results of the FIR filter. Figure 4 shows a sinusoidal inputdescribed earlier where F1 = 370 Hz and F2 = 10500 Hz. Since 10,500 Hz is above the cut-offfrequency, this frequency is attenuated and only a 370 Hz sinusoidal wave remains as shown inFigure 5.

    Figure 4. FIR Filter Input

  • 7/21/2019 DSP kit Signal processing

    9/18

    SPRA947A

    9Signal Processing Examples Using the TMS320C67x Digital Signal Processing Library (DSPLIB)

    Figure 5. FIR Filter Output

    Table 4 lists the performance for the three FIR functions. As expected, the performance is bestfor the radix 2 FIR because its more stringent restrictions allow for better loop unrolling andsoftware pipelining. The complex FIR takes significantly longer than the rest.

    Table 4. FIR Filter Benchmarks (240 Kernel Coefficients (nh) and 200 Output Samples (nr))

    Number of Cycles

    Functions Formula Observed (with data in L2 SRAM)

    DSPF_sp_fir_gen 24508 = (4*floor((nh1)/2)+14)*(ceil(nr/4)) + 8 24968

    DSPF_sp_fir_r2 24034 = (nh* nr)/2 + 34 24528

    DSPF_sp_fir_cplx 96033 = 2* nh * nr + 33 96864

    With the data in the L2 SRAM the observed cycle count is very similar to the formula cyclecount. The discrepancy is realized when call overhead, L1D read misses, and L1D write missesare taken into account.

    3.2 Infinite Impulse Response (IIR) Filter

    The DSPLIB offers a 2ndorder bi-quadratic IIR filter defined as:

    y(n) 2

    k0

    b(k)x(n k)2

    k1

    a(k)y(n k)

    where x(n) and y(n) are the input and output data, and a(k) and b(k) are the filter coefficients.The a(k) are auto-regressive (AR) coefficients (poles of the transfer function). The b(k) aremoving-average (MA) coefficients (zeros of the transfer function).

    IIR filters generally have nonlinear phase responses, but can meet magnitude responsespecifications with much lower orders than FIR filters. However, due to their nature of instability,care must be taken in their design to meet stability criteria.

    The prototype and requirements of the IIR function follows.

  • 7/21/2019 DSP kit Signal processing

    10/18

    SPRA947A

    10 Signal Processing Examples Using the TMS320C67x Digital Signal Processing Library (DSPLIB)

    3.2.1 DSPF_sp_biquad Single Precision Bi-quadratic IIR filter

    void DSPF_sp_biquad (float* x, float* b, float* a, float* delay, float* r, intnx)

    x points to a floating point array of length nx which holds the input samples.

    b points to a floating point array of length 3 which holds the MA coefficients: b[0], b[1], andb[2]. b must be double-word aligned.

    a points to a floating point array of length 2 which holds the AR coefficients: a[0] and a[1]. amust be double-word aligned.

    delay points to a floating point array of length 2 which holds the delay coefficients: d[0] andd[1]

    r points to a floating point array of length nx which holds the output samples.

    nx is the length of the coefficient array. nx must be greater than or equal to 4.

    3.2.2 IIR Filter Example

    This example demonstrates the use of the C67x DSPLIB IIR filtering capabilities. First, filtercoefficients are generated in Matlab using the Filter Design and Analysis Tool (Matlab command:fdatool) with filter specifications listed in Table 5. The frequency response of this FIR filter isshown in Figure 6.

    Table 5. IIR Filter Design Specifications

    Filter Type Low-pass

    Order 2

    Design Method Butterworth

    Sampling frequency 44,100 Hz

    Cut-off frequency 8,000 Hz

    Figure 6. Frequency Response of Low-Pass IIR Filter

  • 7/21/2019 DSP kit Signal processing

    11/18

    SPRA947A

    11Signal Processing Examples Using the TMS320C67x Digital Signal Processing Library (DSPLIB)

    Input data to the IIR filter are generated in floating point format as follows:

    x[i] = (sin(2 * PI * F1 * i / Fs) + sin(2 * PI * F2 * i / Fs));

    where F1 and F2 are the two input data frequencies and Fs is the sampling frequency(44,100 Hz). Figure 7 and Figure 8 show the results of the FIR filter. Figure 7 shows a sinusoidalinput described earlier where F1 = 370 Hz and F2 = 18500 Hz. Since 18,500 Hz is above the

    cut-off frequency, this frequency is attenuated and only a 370 Hz sinusoidal wave remains asshown in Figure 8.

    Figure 7. IIR Filter Input

    Figure 8. IIR Filter Output

    Table 6 shows the performance of the IIR filter.

    Table 6. IIR Filter Benchmarks (200 output samples (nx))

    Number of Cycles

    Functions Formula Observed (with data in L2 SRAM)

    DSPF_sp_biquad 876 = 4* nx + 76 1196

  • 7/21/2019 DSP kit Signal processing

    12/18

    SPRA947A

    12 Signal Processing Examples Using the TMS320C67x Digital Signal Processing Library (DSPLIB)

    With the data in the L2 SRAM, the observed cycle count is very similar to the formula cyclecount. The discrepancy is realized when call overhead, L1D read misses, and L1D write missesare taken into account.

    3.3 Fast Fourier Transform (FFT)

    FFT is widely used for frequency-domain processing and spectrum analysis. It is acomputationally efficient discrete Fourier transform (DFT) defined as:

    X(k) N1

    n0

    xnWknN

    , k 0,..., N 1

    where

    WknN e2j nkN

    The C67x DSPLIB provides three FFT functions:

    1. DSPF_sp_cfftr2_dit

    2. DSPF_sp_cfftr4_dif

    3. DSPF_sp_fftSPxSP

    The C67x DSPLIB also provides two inverse FFT functions:

    1. DSPF_sp_icfftr2_dit (used for both radix 2 and radix 4 FFTs)

    2. DSPF_sp_ifftSPxSP

    The prototypes and restrictions for the FFT functions follow:

    3.3.1 DSPF_sp_cfftr2_dit Complex Radix 2 FFT using Decimation-In-Time

    void DSPF_sp_cfftr2_dit(float * x, float * w, short n)

    x points to a floating-point array of length 2*n which holds n complex input samples. x mustbe double-word aligned. After running the function, the output will also be stored in x. Theoutput must be bit-reversed using the bit reverse function found in the FFT support: bit_rev.

    w points to a floating-point array of length n which holds the n/2 twiddle factors. w can becreated with the radix 2 twiddle generation function found in the FFT support: tw_genr2fft.After creating the array, w must be bit-reversed using bit_rev.

    n is the length of the FFT in complex samples. n must be a power of 2 and greater than orequal to 32.

    3.3.2 DSPF_sp_cfftr4_dif Complex Radix 4 FFT using Decimation-In-Frequency

    void DSPF_sp_cfftr4_dif(float * x, float * w, short n)

    x points to a floating point array of length 2*n which holds n complex input samples. Afterrunning the function, the output will also be stored in x. The output must be digit-reversedusing the digit reverse functions found in the FFT support: R4DigitRevIndexTableGen &digit_reverse. R4DigitRevIndexTableGen creates index tables that are used by thedigit_reverse.

    w points to a floating-point array of length (3/2)*n which holds the (3/4)*n complex twiddlefactors. w can be created with the radix 4 twiddle generation function found in the FFTsupport: tw_genr4fft.

    n is the length of the FFT in complex samples. n must be a power of 4.

  • 7/21/2019 DSP kit Signal processing

    13/18

    SPRA947A

    13Signal Processing Examples Using the TMS320C67x Digital Signal Processing Library (DSPLIB)

    3.3.3 DSPF_sp_fftSPxSP Cache Optimized Mixed Radix FFT with digit reversal

    void DSPF_sp_fftSPxSP(int n, float* x, float* w, float* y, unsigned charbrev[], int n_min, int offset, int n_max)

    n is the length of the FFT in complex samples. nmust be a power of 2 and greater than orequal to 8 and less than or equal to 8192.

    x points to a floating-point array of length 2*n which holds n complex input samples. x mustbe double-word aligned.

    w points to a floating-point array of length 2*n which holds n complex twiddle factors. w canbe created with the twiddle generation function found in FFT support: tw_genSPxSPfft.

    y points to a floating-point array of length 2*n which holds n complex output samples. y mustbe double-word aligned.

    brev is a 64-entry bit-reverse data table. The values for this table can be found in the FFTsupport file brev_table.h

    n_min is the smallest FFT butterfly used in computation.

    offset is the index of complex FFT samples from the start of the main FFT.

    n_max size of the main FFT in complex samples.

    The DSPF_sp_fftSPxSP routine has been modified to allow for higher cache efficiency. Theroutine can be called in a single-pass or multi-pass fashion. As single-pass, the routine behaveslike other DSPLIB FFT routines: if the total data size accessed by the routine fits into the L1D,then the single-pass use is most efficient. The total data size accessed for an N-point FFT is N x2 complex parts x 4 bytes per floating point input value plus the same amount for the twiddlefactor array: 16XN bytes. The L1D capacity for the C671x device is 4 Kbytes. If N less than orequal to 256, the single pass is the best choice. If N is greater than 256, then the multi-passimplementation would be the best choice. For more details on cache, see the TMS320 DSPCache Users Guide (SPRU656).

    3.3.3.1 DSPF_sp_fftSPxSP Single-Pass Implementation

    The single-pass implementation is straight forward.

    n = n_max

    x, w, and y all point to their start of the arrays

    n_min = the radix of the FFT

    offset =0

    For example when N =256 and radix = 4:

    DSPF_sp_fftSPxSP(N, &x[0], &w[0], y, brev, radix, 0, N);

    3.3.3.2 DSPF_sp_fftSPxSP Multi-Pass Implementation

    The multi-pass implementation requires multiple calls of the same function. The goal of thisimplementation is to break up a large FFT into several FFTs that are small enough to fit into theL1D (N

  • 7/21/2019 DSP kit Signal processing

    14/18

    SPRA947A

    14 Signal Processing Examples Using the TMS320C67x Digital Signal Processing Library (DSPLIB)

    The multi-pass implementation requires two stages. The first stage only has one function call:

    n = n_max

    x, w, and y all point to start of the array

    n_min = n divided by the number of sub-FFTs

    offset =0

    The second stage computes the individual sub-FFTs. The second stage has 1 call for eachsub-FFT:

    n = N divided by the number of sub-FFTs. This value corresponds to the length of thesub-FFT.

    x is offset to point to the start of the sub-FFT array.

    w points to the twiddle factors for the second stage. This pointer is the same for each of thesub-FFTs (see example).

    y points to the start of the array

    n_min is the radix of the sub-FFT.

    offset is the integer offset that corresponds to the start of the sub-FFT in the input data set.

    N = N

    For example, when N = 1024 and radix= 4, the multi-pass implementation requires four passes:

    /* stage one */

    DSPF_sp_fftSPxSP(N, &x[0], &w[0], y, brev, N/4, 0, N);

    /* stage two */DSPF_sp_fftSPxSP(N/4,&x[2*3*N/4], &w[2*3*N/4], y, brev, radix, N*3/4, N);

    DSPF_sp_fftSPxSP(N/4,&x[2*2*N/4], &w[2*3*N/4], y, brev, radix, N*2/4, N);

    DSPF_sp_fftSPxSP(N/4,&x[2*1*N/4], &w[2*3*N/4], y, brev, radix, N*1/4, N);

    DSPF_sp_fftSPxSP(N/4,&x[0], &w[2*3*N/4], y, brev, radix, 0, N);

    Also, when N = 2048 and radix = 2, the multi-pass implementation requires 16 passes:

    /* stage one */

    DSPF_sp_fftSPxSP(N, &x[0], &w[0], y, brev, N/16, 0, N);

    /* stage two */

    for(i=0;i

  • 7/21/2019 DSP kit Signal processing

    15/18

    SPRA947A

    15Signal Processing Examples Using the TMS320C67x Digital Signal Processing Library (DSPLIB)

    3.3.4 FFT Example

    This example demonstrates the use of the C67x DSPLIB FFT filtering capabilities.

    Input data to the FFT is generated in floating point format as follows:

    for(i=0; i

  • 7/21/2019 DSP kit Signal processing

    16/18

    SPRA947A

    16 Signal Processing Examples Using the TMS320C67x Digital Signal Processing Library (DSPLIB)

    Figure 10. Magnitude of the Output

    3.3.5 Performance Analysis

    Table 7 shows the performance of a 1024 point FFT using the single-pass implementation ofDSPF_sp_fftSPxSP.

    Table 7. Single-Pass FFT Benchmarks for 1024 Point input

    Number of Cycles

    Functions Formula Observed (with data in L2 SRAM)

    DSPF_sp_fftSPxSP 14464=3*ceil(log4(N)1)*N + 21*

    ceil(log4(N)1) + 2*N + 44

    28444

    Clearly, 28,444 cycles is significantly more than the cycle count formula of 14,464. As explainedabove, the single-pass implementation creates a significant amount of cache trashing. Table 8shows the cache advantages of the multi-pass implementation by comparing the cache missesof the single-pass and multi-pass implementations.

    Table 8. Single-Pass vs. Multi-Pass FFT Benchmarks

    First Call Second Call Third Call Fourth Call Fifth Call Total Observed

    Single-Pass

    DSPF_sp_fftSPxSP

    28444

    L1D Miss Cycles 13870 13870

    L1P Miss Cycles 122 122

    Multi-Pass

    DSPF_sp_fftSPxSP

    24976

    L1D Miss Cycles 5724 1107 1013 1019 1081 9944

    L1P Miss Cycles 93 30 5 5 5 138

  • 7/21/2019 DSP kit Signal processing

    17/18

    SPRA947A

    17Signal Processing Examples Using the TMS320C67x Digital Signal Processing Library (DSPLIB)

    The total cycle count for the multi-pass implementation is significantly less than the single-passimplementation. The multi-pass implementation allows less L1D miss cycles in each of thesub-FFTs because all the data used by each 256 length sub-FFT fits into the 4 Kbyte cache.

    4 References1. TMS320C67x DSP Library Programmers Reference(SPRU657)

    2. TMS320C6000 Peripherals Reference Guide(SPRU190)

    3. TMS320C621x/C671x DSP Two-Level Internal Memory Reference Guide(SPRU609)

    4. TMS320 DSP Cache Users Guide(SPRU656)

    5. John G. Proakis and Dimitris G. Manolakis, Digital Signal Processing, Principles Algorithms,and Applications, Prentice Hall, Third Edition, 1996.

    6. Emmanuel C. Ifeachor and Barrie W. Jervis, Digital Signal Processing, A Practical Approach,Prentice Hall, Second Edition, 2002.

  • 7/21/2019 DSP kit Signal processing

    18/18

    I M P O R T A N T N O T I C E

    T e x a s I n s t r u m e n t s I n c o r p o r a t e d a n d i t s s u b s i d i a r i e s ( T I ) r e s e r v e t h e r i g h t t o m a k e c o r r e c t i o n s , m o d i f i c a t i o n s , e n h a n c e m e n t s , i m p r o v e m e n t s , a n d o t h e r c h a n g e s t o i t s p r o d u c t s a n d s e r v i c e s a t a n y t i m e a n d t o d i s c o n t i n u e a n y p r o d u c t o r s e r v i c e w i t h o u t n o t i c e . C u s t o m e r s s h o u l d o b t a i n t h e l a t e s t r e l e v a n t i n f o r m a t i o n b e f o r e p l a c i n g o r d e r s a n d s h o u l d v e r i f y t h a t s u c h i n f o r m a t i o n i s c u r r e n t a n d c o m p l e t e . A l l p r o d u c t s a r e s o l d s u b j e c t t o T I s t e r m s a n d c o n d i t i o n s o f s a l e s u p p l i e d a t t h e t i m e o f o r d e r a c k n o w l e d g m e n t .

    T I w a r r a n t s p e r f o r m a n c e o f i t s h a r d w a r e p r o d u c t s t o t h e s p e c i f i c a t i o n s a p p l i c a b l e a t t h e t i m e o f s a l e i n a c c o r d a n c e w i t h T I s s t a n d a r d

    w a r r a n t y . T e s t i n g a n d o t h e r q u a l i t y c o n t r o l t e c h n i q u e s a r e u s e d t o t h e e x t e n t T I d e e m s n e c e s s a r y t o s u p p o r t t h i s w a r r a n t y . E x c e p t w h e r e m a n d a t e d b y g o v e r n m e n t r e q u i r e m e n t s , t e s t i n g o f a l l p a r a m e t e r s o f e a c h p r o d u c t i s n o t n e c e s s a r i l y p e r f o r m e d .

    T I a s s u m e s n o l i a b i l i t y f o r a p p l i c a t i o n s a s s i s t a n c e o r c u s t o m e r p r o d u c t d e s i g n . C u s t o m e r s a r e r e s p o n s i b l e f o r t h e i r p r o d u c t s a n d a p p l i c a t i o n s u s i n g T I c o m p o n e n t s . T o m i n i m i z e t h e r i s k s a s s o c i a t e d w i t h c u s t o m e r p r o d u c t s a n d a p p l i c a t i o n s , c u s t o m e r s s h o u l d p r o v i d e a d e q u a t e d e s i g n a n d o p e r a t i n g s a f e g u a r d s .

    T I d o e s n o t w a r r a n t o r r e p r e s e n t t h a t a n y l i c e n s e , e i t h e r e x p r e s s o r i m p l i e d , i s g r a n t e d u n d e r a n y T I p a t e n t r i g h t , c o p y r i g h t , m a s k w o r k r i g h t , o r o t h e r T I i n t e l l e c t u a l p r o p e r t y r i g h t r e l a t i n g t o a n y c o m b i n a t i o n , m a c h i n e , o r p r o c e s s i n w h i c h T I p r o d u c t s o r s e r v i c e s a r e u s e d . I n f o r m a t i o n p u b l i s h e d b y T I r e g a r d i n g t h i r d - p a r t y p r o d u c t s o r s e r v i c e s d o e s n o t c o n s t i t u t e a l i c e n s e f r o m T I t o u s e s u c h p r o d u c t s o r s e r v i c e s o r a w a r r a n t y o r e n d o r s e m e n t t h e r e o f . U s e o f s u c h i n f o r m a t i o n m a y r e q u i r e a l i c e n s e f r o m a t h i r d p a r t y u n d e r t h e p a t e n t s o r o t h e r i n t e l l e c t u a l p r o p e r t y o f t h e t h i r d p a r t y , o r a l i c e n s e f r o m T I u n d e r t h e p a t e n t s o r o t h e r i n t e l l e c t u a l p r o p e r t y o f T I .

    R e p r o d u c t i o n o f T I i n f o r m a t i o n i n T I d a t a b o o k s o r d a t a s h e e t s i s p e r m i s s i b l e o n l y i f r e p r o d u c t i o n i s w i t h o u t a l t e r a t i o n a n d i s a c c o m p a n i e d b y a l l a s s o c i a t e d w a r r a n t i e s , c o n d i t i o n s , l i m i t a t i o n s , a n d n o t i c e s . R e p r o d u c t i o n o f t h i s i n f o r m a t i o n w i t h a l t e r a t i o n i s a n u n f a i r a n d d e c e p t i v e b u s i n e s s p r a c t i c e . T I i s n o t r e s p o n s i b l e o r l i a b l e f o r s u c h a l t e r e d d o c u m e n t a t i o n . I n f o r m a t i o n o f t h i r d p a r t i e s m a y b e s u b j e c t t o a d d i t i o n a l r e s t r i c t i o n s .

    R e s a l e o f T I p r o d u c t s o r s e r v i c e s w i t h s t a t e m e n t s d i f f e r e n t f r o m o r b e y o n d t h e p a r a m e t e r s s t a t e d b y T I f o r t h a t p r o d u c t o r s e r v i c e v o i d s a l l

    e x p r e s s a n d a n y i m p l i e d w a r r a n t i e s f o r t h e a s s o c i a t e d T I p r o d u c t o r s e r v i c e a n d i s a n u n f a i r a n d d e c e p t i v e b u s i n e s s p r a c t i c e . T I i s n o t r e s p o n s i b l e o r l i a b l e f o r a n y s u c h s t a t e m e n t s .

    T I p r o d u c t s a r e n o t a u t h o r i z e d f o r u s e i n s a f e t y - c r i t i c a l a p p l i c a t i o n s ( s u c h a s l i f e s u p p o r t ) w h e r e a f a i l u r e o f t h e T I p r o d u c t w o u l d r e a s o n a b l y b e e x p e c t e d t o c a u s e s e v e r e p e r s o n a l i n j u r y o r d e a t h , u n l e s s o f f i c e r s o f t h e p a r t i e s h a v e e x e c u t e d a n a g r e e m e n t s p e c i f i c a l l y g o v e r n i n g s u c h u s e . B u y e r s r e p r e s e n t t h a t t h e y h a v e a l l n e c e s s a r y e x p e r t i s e i n t h e s a f e t y a n d r e g u l a t o r y r a m i f i c a t i o n s o f t h e i r a p p l i c a t i o n s , a n d a c k n o w l e d g e a n d a g r e e t h a t t h e y a r e s o l e l y r e s p o n s i b l e f o r a l l l e g a l , r e g u l a t o r y a n d s a f e t y - r e l a t e d r e q u i r e m e n t s c o n c e r n i n g t h e i r p r o d u c t s a n d a n y u s e o f T I p r o d u c t s i n s u c h s a f e t y - c r i t i c a l a p p l i c a t i o n s , n o t w i t h s t a n d i n g a n y a p p l i c a t i o n s - r e l a t e d i n f o r m a t i o n o r s u p p o r t t h a t m a y b e p r o v i d e d b y T I . F u r t h e r , B u y e r s m u s t f u l l y i n d e m n i f y T I a n d i t s r e p r e s e n t a t i v e s a g a i n s t a n y d a m a g e s a r i s i n g o u t o f t h e u s e o f T I p r o d u c t s i n s u c h s a f e t y - c r i t i c a l a p p l i c a t i o n s .

    T I p r o d u c t s a r e n e i t h e r d e s i g n e d n o r i n t e n d e d f o r u s e i n m i l i t a r y / a e r o s p a c e a p p l i c a t i o n s o r e n v i r o n m e n t s u n l e s s t h e T I p r o d u c t s a r e s p e c i f i c a l l y d e s i g n a t e d b y T I a s m i l i t a r y - g r a d e o r " e n h a n c e d p l a s t i c . " O n l y p r o d u c t s d e s i g n a t e d b y T I a s m i l i t a r y - g r a d e m e e t m i l i t a r y s p e c i f i c a t i o n s . B u y e r s a c k n o w l e d g e a n d a g r e e t h a t a n y s u c h u s e o f T I p r o d u c t s w h i c h T I h a s n o t d e s i g n a t e d a s m i l i t a r y - g r a d e i s s o l e l y a t t h e B u y e r ' s r i s k , a n d t h a t t h e y a r e s o l e l y r e s p o n s i b l e f o r c o m p l i a n c e w i t h a l l l e g a l a n d r e g u l a t o r y r e q u i r e m e n t s i n c o n n e c t i o n w i t h s u c h u s e .

    T I p r o d u c t s a r e n e i t h e r d e s i g n e d n o r i n t e n d e d f o r u s e i n a u t o m o t i v e a p p l i c a t i o n s o r e n v i r o n m e n t s u n l e s s t h e s p e c i f i c T I p r o d u c t s a r e d e s i g n a t e d b y T I a s c o m p l i a n t w i t h I S O / T S 1 6 9 4 9 r e q u i r e m e n t s . B u y e r s a c k n o w l e d g e a n d a g r e e t h a t , i f t h e y u s e a n y n o n - d e s i g n a t e d p r o d u c t s i n a u t o m o t i v e a p p l i c a t i o n s , T I w i l l n o t b e r e s p o n s i b l e f o r a n y f a i l u r e t o m e e t s u c h r e q u i r e m e n t s .

    F o l l o w i n g a r e U R L s w h e r e y o u c a n o b t a i n i n f o r m a t i o n o n o t h e r T e x a s I n s t r u m e n t s p r o d u c t s a n d a p p l i c a t i o n s o l u t i o n s :

    P r o d u c t s A p p l i c a t i o n s A m p l i f i e r s a m p l i f i e r . t i . c o m A u d i o w w w . t i . c o m / a u d i o D a t a C o n v e r t e r s d a t a c o n v e r t e r . t i . c o m A u t o m o t i v e w w w . t i . c o m / a u t o m o t i v e D L P P r o d u c t s w w w . d l p . c o m B r o a d b a n d w w w . t i . c o m / b r o a d b a n d D S P d s p . t i . c o m D i g i t a l C o n t r o l w w w . t i . c o m / d i g i t a l c o n t r o l C l o c k s a n d T i m e r s w w w . t i . c o m / c l o c k s M e d i c a l w w w . t i . c o m / m e d i c a l I n t e r f a c e i n t e r f a c e . t i . c o m M i l i t a r y w w w . t i . c o m / m i l i t a r y L o g i c l o g i c . t i . c o m O p t i c a l N e t w o r k i n g w w w . t i . c o m / o p t i c a l n e t w o r k P o w e r M g m t p o w e r . t i . c o m S e c u r i t y w w w . t i . c o m / s e c u r i t y M i c r o c o n t r o l l e r s m i c r o c o n t r o l l e r . t i . c o m T e l e p h o n y w w w . t i . c o m / t e l e p h o n y R F I D w w w . t i - r f i d . c o m V i d e o & I m a g i n g w w w . t i . c o m / v i d e o R F / I F a n d Z i g B e e S o l u t i o n s w w w . t i . c o m / l p r f W i r e l e s s w w w . t i . c o m / w i r e l e s s

    M a i l i n g A d d r e s s : T e x a s I n s t r u m e n t s , P o s t O f f i c e B o x 6 5 5 3 0 3 , D a l l a s , T e x a s 7 5 2 6 5 C o p y r i g h t 2 0 0 9 , T e x a s I n s t r u m e n t s I n c o r p o r a t e d

    http://www.ti.com/lprfhttp://www.ti.com/wirelesshttp://www.ti-rfid.com/http://www.ti.com/videohttp://microcontroller.ti.com/http://www.ti.com/telephonyhttp://power.ti.com/http://www.ti.com/securityhttp://logic.ti.com/http://www.ti.com/opticalnetworkhttp://interface.ti.com/http://www.ti.com/militaryhttp://www.ti.com/clockshttp://www.ti.com/medicalhttp://dsp.ti.com/http://www.ti.com/digitalcontrolhttp://amplifier.ti.com/http://www.ti.com/audiohttp://www.ti.com/wirelesshttp://www.ti.com/lprfhttp://www.ti.com/videohttp://www.ti-rfid.com/http://www.ti.com/telephonyhttp://microcontroller.ti.com/http://www.ti.com/securityhttp://power.ti.com/http://www.ti.com/opticalnetworkhttp://logic.ti.com/http://www.ti.com/militaryhttp://interface.ti.com/http://www.ti.com/medicalhttp://www.ti.com/clockshttp://www.ti.com/digitalcontrolhttp://dsp.ti.com/http://www.ti.com/broadbandhttp://www.dlp.com/http://www.ti.com/automotivehttp://dataconverter.ti.com/http://www.ti.com/audiohttp://amplifier.ti.com/