efficient multirate signal processing

33
Uncoupling Software Architectures from Platforms Tone Relay in Voice-over-Packet Data Networks SPRN157 March 2002 Practical solutions for DSP system developers Efficient Multirate Signal Processing Efficient Multirate Signal Processing

Upload: others

Post on 03-Feb-2022

27 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Efficient Multirate Signal Processing

Uncoupling Software Architectures from Platforms

Tone Relayin Voice-over-PacketData Networks

SPRN157

March 2002Practical solutions for DSP system developers

EfficientMultirateSignalProcessing

EfficientMultirateSignalProcessing

Page 2: Efficient Multirate Signal Processing
Page 3: Efficient Multirate Signal Processing

Embedded Edge March 2002 3

Stan RunyonEditor-in-Chief

[email protected]

Mike RobinsonManaging [email protected]

Tim MoranCreative Director

Donna MoranArt Director

Genevieve JoergerDirector, Custom Solutions

[email protected]

Gregory MontgomeryDirector of [email protected]

Grace AdamoProject Manager

Robert SteigleiderAd [email protected]

Susan HarperCirculation Director

[email protected]

Embedded Edge is published by Texas Instruments, Inc. and produced in cooperation withCMP Media Inc. Entire contents Copyright © 2002 TexasInstruments, Inc. The publication of information regardingany other company’s products or services does not consti-tute Texas Instruments’ approval, warranty or endorsement

thereof. To subscribe on-line, visit:www.edtn.com/customsolutions/edge/subscribe.fhtml

Code Composer Studio, TMS320, TMS320C6000, C6000,TMS320C5000, C5000, TMS320C2000, C2000,

DSP/BIOS and eXpressDSP are trademarks of TexasInstruments, Inc. All other trademarks are the property of

their respective owners.

March 2002

Volume 3 March 2002 Number 1

Inside This Issue

Insighter: Startling Tales of Nonfiction 4DSP applications are startlingly complex and more powerful, but

they can be written and debugged in reasonable amounts of time.

Breakpoints 6News from the providers of embedded systems development

products and services.

Cover: Efficient Multirate Signal Processing 8With its dual MAC, abundance of buses, and copious internal RAM,

the TMS320C55x allows a high degree of parallelism—just the ticket

for applications using decimating filter banks.

New Methodology Tames Distributed Apps 16Coordination-centric design uncouples software architectures

from platforms, taking the pain out of developing distributed

applications, heterogeneous as well as homogeneous.

Passing Tones in Voice-over-Packet Systems 24A tone relay can eliminate troublesome signaling tone distortion

and other problems in packet data networks that employ speech

compression.

30

Launchings 29New products and services for embedded systems developers.

On the Edge 30DSP RTOSs come of age, but developers need more.

29

22

Page 4: Efficient Multirate Signal Processing

Insighter

4 March 2002 Embedded Edge

If you’re amazed by the ongoing advances in DSP sili-con and development tools, you ought to be nothing

less than astounded by the commensurate reach of DSPapplications. Not only are the applications startlinglycomplex and more powerful, but they can be written anddebugged in reasonable amounts of time—without amainframe computer and swarms of computer scientists.

Take the cover story, “Efficient Multirate SignalProcessing with the C55x.” Here, Michael Tsiroulnikov,of MIKET DSP Solutions, shows how to take advantageof the copious on-chip features of the TMS320C55x gen-eration of DSPs—a dual multiplier-accumulator, multi-ple buses, and lots of RAM—for multirate operations inthe complex domain. In particular, he shows how toshape decimating filter banks, a key component of mul-tirate signal processing systems.

Working in the complex domain helps sidestep thelimitations of real-domain bandpass sampling, increasesperformance, and simplifies later signal processing. Atthe same time, the large store of RAM satisfies the oftenunquenchable thirst of multirate signal processing sys-tems for fast RAM in which to store filter bank coeffi-cients. In addition, the C55x’s speed, boosted in largepart by its dual MAC, enables you to compensate forburgeoning program code by running more instances ofan algorithm in a given amount of time. All told, youshould realize about an order-of-magnitude cumulativeimprovement in performance over previous generationsof DSPs when you implement signal detectors, decimat-ing filter banks, and other functions that filter input datathrough an array of same-length complex filters.

Increasingly, such powerhouse signal processors arefinding themselves working alongside general-purposeprocessors—on one chip, no less. This teamwork isgreat for distributed heterogeneous applications, but itposes a challenge if you’re seeking to wring the best effi-ciency out of the on-chip resources.

For instance, transformational algorithms are best per-formed by digital signal processors, whereas control-intensive and graphical user interface functions are bestserved by a general-purpose microprocessor. Accordingly,applications that are both control-intensive and algorith-mically complex typically call for multiple-processorarchitectures.

Take heart. A new methodology, termed coordination-centric design, uncouples the architecture from theplatform and takes the pain out of developing distrib-uted heterogeneous applications. As David McCooey,Ken Hines, and Ross Ortega of Consystant DesignTechnologies relate in our second article, in developingdistributed software applications software architectsmust consider multiple operating systems and thinkabout how applications will map to specific processors.Specialists could be required to deal with obscure pro-gramming paradigms or tight performance require-ments. Such implementation details can obscure thehigh-level software architecture.

Coordination-centric design lets designers cleanlyseparate the two and automatically produces efficientsource code from high-level software models. Thus itsupports heterogeneous distributed architectures andallows you to specify low-level implementation detailswithout reference to the software design.

Eliminating trouble is the theme of our third article,this time the troublesome signaling tone distortion andother problems that can pop up in speech-compressedpacket data networks. Thesolution: Use a tone relay topass tones in voice-over-packet systems.

Adaptive Digital Tech-nologies’ Scott Kurtz delin-eates the problems causedby low-rate speech com-pression algorithms anddelves into the design andnuances of tone relays andtheir accompanying algo-rithms. For instance, al-though a tone passer seemsstraightforward at firstglance, its implementationdetails can make or breakperformance. Kurtz's thor-ough discussion should helpclear up any distortion.

—Stan [email protected]

Startling Tales of Nonfiction

Page 5: Efficient Multirate Signal Processing
Page 6: Efficient Multirate Signal Processing

6 March 2002 Embedded Edge

Breakpoints

Spectrum Signal Processing Inc.(Burnaby, B.C., www.spectrumsig-nal.com) is slated to receive over$500,000 (Canadian) from the De-fence Industrial Research Program(DIRP) to create additional capabil-ity for its flexComm line of soft-ware-defined radios. The DIRP, aninitiative of Defence R&D Canada,funds defense-related industry re-

search and development. In thiscase, the money will go towarddeveloping hardware and softwarethat adds ultrawideband capabilityto Spectrum’s SDRs. The ultrawide-band SDRs, in turn, will serve assubsystems in a modem processingsystem for a software-configurable,military satellite communicationsterminal.

Texas Instruments Incorporated(Dallas, www.ti.com) has joined thenonprofit, 50-plus-member RapidIOTrade Association (San Francisco;www.rapidio.org), an open-standardsinterconnect group that is pushing the

use of “in-s i de - t he -box” high-p e r f o r -m a n c e

switch fabric technology to multigiga-bit-per-second transmission rates.Toward that end, the association’sRapidIO specification, released inMarch 2001, defines how chips andboards communicate within a system.

TI JoinsRapidIO Open-StandardsTrade Group

Texas Instruments, Inc. (Dallas; www.ti.com) has extended its OMAP wirelessarchitecture to include the Linux operating system. The extension will give

developers of Linux applications for 2.5G and 3G mobiledevices easy access to the company’s DSPs. In addition,TI hasselected RidgeRun, Inc. (Boise, Idaho www.ridgerun.com),developer of the DSPLinux operating system, to directly

assist its customers. It has also worked with RidgeRun to enhance DSPLinux totake advantage of the real-time multimedia capabilities of the OMAP platform.

Spectrum Signal Wins Defense R&D Funds

Dy 4 Systems (Kanata, Ont.; www.dy4.com) has been acquired by, and is now a busi-ness unit of, Force Computers (Fremont, Calif., www.forcecomputers.com).The acqui-sition is the result of a merger of Force’s parent company, Solectron, and C-MACIndustries. Dy 4 will keep its brand of COTS ruggedized embedded computing prod-ucts for the defense and aerospace industry.At the same time, it will draw on Force’shardware and software expertise in the commercial and telecom markets. For itspart, Force aims to work Dy 4’s DSP expertise into its OEM business.

Dy 4 Taken by Force (Computers,That Is)

Questra Corporation (Rochester, N.Y.,www.questra.com) has completed a $19million second round of financing withMenlo Ventures and Trident Capital. Its enterprise software integratesmission-critical, intelligent devices with a company’s service and sup-port infrastructure via the Internet.

Questra Caps Second Financing Round

TI Embraces Linux for Its OMAP Platform

PCTEL (Milpitas, Calif.; www.pctel.com) has named John Schoen andJeff Miller to its executive team.Schoen will serve as both chief oper-ating officer and chief financial offi-cer, positions he held at SAFCOTechnologies. Before that, he filledvarious financialspots during 19years at Mo-torola. Miller,who will take over as vice presidentof development, led SAFCO’s Testand Measurement Group for 3 yearsbefore it was acquired. Earlier, he ledthe implementation of the Cellnetcellular network and managed largesoftware projects for Motorola’sCellular Infrastructure Group.

Duo Joins PCTELExecutive Team

Page 7: Efficient Multirate Signal Processing
Page 8: Efficient Multirate Signal Processing

8 March 2002 Embedded Edge

With its dual

MAC, abundance

of buses, and

copious internal

RAM, the

TMS320C55x

allows a high

degree of

parallelism—just

the ticket for

applications

using decimating

filter banks.

By Michael Tsiroulnikov

EFFICIENTMULTIRATE SIGNALPROCESSING WITH THE C55X

Compared with previous generations of digital sig-

nal processors, Texas Instruments’ TMS320C55x

generation of DSPs is particularly well suited to

multirate operations in the complex domain. Its

single-cycle dual multiplier/accumulator (MAC),

multiplicity of buses, and generous endowment

of internal RAM give it about an order-of-magnitude cumulative

improvement in performance over previous generations of DSPs

when implementing signal detectors, decimating filter banks,

Multirate Signal Processing

Page 9: Efficient Multirate Signal Processing

Embedded Edge March 2002 9

and other functions that filter inputdata through an array of same-length complex filters.

Complex-domain operationsmake it possible to overcome thelimitations of real-domain bandpasssampling, thereby helping to maxi-mize a filter’s decimation ratio with-out any loss in precision. A deci-mating filter bank can be viewed asan application-specific transform ora signal compression technique inwhich blocks of real signal data areconverted into shorter sets of down-sampled values.

PROCESSOR HIGHLIGHTSCompared with its predecessors,the TMS320C54x generation, theC55x is highly advanced. Mostnotably, just its single-cycle dualMAC, supported by multiple busesthat let it retrieve three 16-bit datawords in a single cycle, gives theprocessor a twofold performanceadvantage. In addition, the new

Figure 1. Organizing the data into blocks of symmetric and asymmetric coeffi-

cients and then taking them in sequential order for the even and odd indices,

respectively, keeps auxiliary pointer manipulations to a minimum. The coeffi-

cient data pointer (CDP) points to the prefolded input data; the auxiliary regis-

ters (ARx) point to the coefficient arrays.

Multirate Signal Processing

Page 10: Efficient Multirate Signal Processing

processors have significantly more internal RAM.These resources can be used very effectively to realizevarious filter banks and functional transforms.

Relative to code written for the C54x, code for theC55x has less loop initialization and finalization over-head—an important advantage in staged multirate sig-nal processing, when filters are often short and theoverhead can therefore add significantly to the execu-tion time.

Other valuable new features of the C55x are its coef-ficient data pointer (CDP) register and its auxiliaryregisters (ARx), which now allow long immediate off-set values to be included in the op code without addingany extra cycles when the program is executing. Thiscapability makes it easier to develop assembly code ina C-like fashion, thereby shortening the R&D cycle.

The key to the successful design of a DSP applica-tion is appropriate data organization for both the filterbank coefficients and the input data. Properly orga-nized data makes for a smooth, natural flow of opera-tions with minimal pointer manipulations and nounnecessary discontinuities in the program flow.

To exploit the advantages of complex-domain filter-ing, the input data vectors should be decomposed intosymmetric and asymmetric components; the decom-position helps to boost performance as much astwofold. The new CDP register is exploited in this con-nection, being used to point to the decomposed data,while the auxiliary registers point to the filter banks.

An obvious drawback of multirate signal processingis its need for a large amount of fast RAM in which tostore filter bank coefficients. That need usually won’tbe a problem for the C55x generation because most ofthe processors have more than enough internal RAM.Also, their speed can compensate for the increase inprogram code by allowing more instances of an algo-rithm to be run in a given time.

REALIZING A GENERIC COMPLEX FILTER BANKTransforming real-domain input data into the complexdomain simplifies later signal processing, especially ifthe application makes use of various nonparametricspectrum analysis techniques. Working in the complexdomain also helps boost performance. The major MIPSsavings obviously come from the decimation process(the filters are run less often), and those savings aredirectly proportional to the decimation ratio.

The savings arises from the fact that real-domainbandpass subsampling is subject to many constraints[1] that must be strictly adhered to if an application isto operate properly. The key problem is aliasing: for a

10 March 2002 Embedded Edge

Multirate Signal Processing

Page 11: Efficient Multirate Signal Processing

Embedded Edge March 2002 11

given sampling frequency, fs, a signal with frequencyfs / 2 + d is aliased by a “mirrored” signal, fs / 2 – d, forany deviation, d. In effect, the whole frequency plan isfolded multiple times within the interval {0 . . . fs / 2}.

The operations on analytic signals in the complexdomain are much simpler, because the frequencies(k * fs + d), for any k, are aliased into themselves—thefrequency plan can be viewed as going around in a cir-cle. Consequently, the central frequency of the signalband can be in any relationship to the sampling fre-quency, and the choice of the decimation ratio ismuch relaxed, although it’s still constrained by the fil-ter properties, the signal/noise spectrum, and theapplication requirements.

So how, exactly, do you convert a real-domain sig-nal, X(t), into complex-domain signal oi(t)? For:

X(t) = {x(t), x(t – 1), x(t – 2), . . .x(t – L + 1)}T

oi(t) = osi(t) + j*oai(t)

where L = the length of the filter, what’s needed is acomplex filter bank made up of N subfilters, Fzi, eachof which has the form:

Fzi = Fsi + j * Fai

where Fs indicates real components of Fz, and Fa indi-cates imaginary ones:

Fsi = {fsi,0, fsi,1, . . . fsi,L–1}T

Fai = {fa i,0, fa i,1, . . . fa i,L–1}T

i = (0 . . . N – 1)

(all of the subfilters are assumed to be of the samelength) and oi(t) can be given as:

oi(t) = FziT * X(t)

For a very wide class of filter banks, Fzi can be rep-resented as the sum of two components, a strictly sym-metrical real one, Fsi, and an asymmetrical imaginaryone, Fai, and both components of each subfilter willstill comply with the rules for the Hilbert transform.The straightforward approach to performing these cal-culations involves using the firs() and firsn() instructionsof the C55x, but there’s a superior approach. If thebank has many subfilters of equal length, it’s more effi-cient to prefold the input data, X(t), around its centerpoint, (tc), into two sets, one a sum of the samplesx(tc – t) and x(tc + t) and the other the differencebetween those same samples:

s(d) = x(tc + d) + x(tc – d)a(d) = x(tc + d) – x(tc – d)

With the input data in complex form, our task nowis to optimize the processing operations so as toexploit the C55x’s dual MAC as fully as possible. Twothings that can be done in pursuit of that goal are:

• Use CDP (which is, by default, a pointer to coeffi-cients) as a pointer into the prefolded data arrays,while using auxiliary pointers to operate with coeffi-cients.

• Regroup filter banks. When CDP points to thesummed data, s(t), we need both auxiliary pointers tooperate with arrays from the Fsi set. Therefore we canset up two “parallel” blocks of filter coefficients foreither Fsi or Fai and use even and odd subfilters sets,{Fs0, Fs2, Fs4, . . . } and {Fs1, Fs3, Fs5, . . . }, linkableinto different single-access RAM segments. Then thefiltering with Fs2i will be performed simultaneouslywith the filtering with Fs2i+1. The modifications ofoperations with Fai are similar. Of course, the code

Multirate Signal Processing

Attend TI's conference for hands-on DSPapplication training and industry networking.

Perfect for hardware, software, and application engineers, managers,

algorithm providers, board developers,consultants and educators.

• Ease product development flow

• Experience hands-on technical training

• Openly exchange ideas, concepts and

visionary insights with peers

Sign up today www.tidevcon.com

August 6

-8, 2002

W estin Gall

eria

Houston,

Texas

Page 12: Efficient Multirate Signal Processing

12 March 2002 Embedded Edge

Multirate Signal Processing

will definitely be simpler if the number of filters in thefilter bank is even. This approach becomes more effi-cient as the number of subfilters grows; indeed, itmakes sense for four or more subfilters.

You can use a very similar approach for the moregeneral case of a complex input and a complex filterbank; the processing flow and the code are essentiallythe same. Be sure to use the Hermitian (conjugatetransposed) operator while doing complex domain fil-tering; otherwise the frequency plan will be inverted.

Determining the best trade-off among decimationfilter, block size, and decimation ratio must be decid-ed case by case. Sometimes it makes perfect sense touse longer filters with better shape factors (narrowertransient bands), which allow a higher decimationratio. But since it’s impossible to provide generaladvice appropriate for all circumstances, it’s up to youto decide on the trade-offs when you develop youralgorithm.

In general, because the filter length is typicallylonger than the block size, the processing flow involvesconcatenating incoming data with previously savedblock(s) of data and then updating the “saved signal”array. The filter bank function is C-callable:

void APP_filter_bank(APP_tSc *pSc, Int *psEven, Int *psOdd);

The filter function accepts as parameters a pointerto a scratchpad structure and pointers to the even andodd filter blocks (Listing 1; the code for concatenation

and updating described above has been omitted, sinceit’s straightforward), each grouped into an array of theform:

Int_aaAppEvenFilterBlock[2*N/2][L/2];

where the first N/2 rows of half-sized filters (L/2) aretaken in sequential order from a symmetric set, Fsi

and the rest are taken from asymmetric set Fai for theeven and odd indices, respectively. Organizing thedata in that manner (Figure 1) helps keep auxiliarypointer manipulations to a minimum.

DATA FORMATIn the same manner, the output of the algorithmshould represent the data in the format that’s mostsuitable for the further processing. A C-like structureAPP_tOut is declared, which may contain other fields ifrequired by an application. The function APP_filter_bank() only stores oi(t) (decimated complex signal pairsper subfilter) into predefined locations.

Note that the filter length doesn’t have to be even.

The code for odd filter lengths is a bit more complicat-ed but very similar.

The assembly code example makes extensive use ofthe long immediate ARx offset registers, since no extracycles are required for the C55x and the increase inthe code size is acceptable (Listing 2; as with Listing 1,the code for concatenation and updating describedearlier has been omitted). The use of these registers

Figure 2. This diagram shows the transmission character-

istics of the second-bin (900-Hz) filter before decimation.

Both axes are scaled linearly. Ideally, the second filter’s

stopband attenuation would reach its maximum value

over the passbands of all the other frequency bins. The fil-

ter’s behavior in the transition bands is less important,

since the only disturbing factor is circuit noise.

Figure 3. By examining the postdecimation signal and fil-

ters using polar coordinates, it’s easy to see that the pass-

bands of all the MFR1 bins will overlap after decimation

and will fall within the same tinted sector around the vec-

tor (–1, 0), which will simplify postprocessing. That won’t

necessarily be true for other applications.

Page 13: Efficient Multirate Signal Processing

Embedded Edge March 2002 13

greatly helps with writing, debugging, and testingassembly code by running bit-exact comparisons witha C version of the same algorithm. Despite the relativecomplexity of the C55x instruction set, I found that Icould write C55x assembly code faster than I couldwrite code for previous DSPs, like the C54x or earliergenerations.

As you can see from the listing, filtering with Fsi andFai is done using an external loop over the blockrepeat counter register (BRC0). The pointers to theinput array (pSc->asPlus or pSc->asMinus) and the outputarray (pOut->sC or pOut->sS) are updated when the firstiteration is over, but the pointers to the coefficientsremain free-running.

Note that the internal loop initiation and finalizingoverhead take only three processor cycles. In certaincases it may be possible to decrease the number ofcycles even further.

THE FILTER BANK FUNCTION IN ACTION Let’s demonstrate the use of this procedure to detectmultifrequency R1 signals as specified in the ITU-TQ.320, Q.322, and Q.323 recommendations. Althougha very simple example, it demonstrates the advantagesof this approach very nicely.

Valid multifrequency signals are represented by two,and only two, tones from a set of six frequenciesfi = 700 + i * 200 Hz, i = (0 . . . 5), with 1.5% toler-

ance and a duration of at least 60 ms. In this kind ofapplication, noise isn’t usually a problem because tonedetection is done before a call is established; hencethere’s no voice on the line to corrupt the tone signal.

Our filter bank consists of six subfilters of length100 (input signal sampling frequency fs = 8 kHz),derived from a Hamming prototype and modulated bycos (2πfi[t – tc]) and sin (2πfi[t – tc]). (Figure 2 showsthe filter shape for the second frequency bin, 900 Hz.)The passband (tinted area) is ±∆f (about 1 dB), ∆f = 40Hz, and the stopband is 200 – ∆f = 160 Hz (42 dB).These values provide some headroom and allow signalprocessing with a maximum frequency deviation of2.35% even for the highest-frequency bin, 1,700 Hz.

The choice of the decimation ratio depends greatlyon the operations to be performed on the decimatedsignal. Although signal detection applications allow fordeep subsampling, other applications, such as sub-band adaptive filtering, may impose another set of(obviously, much harder) constraints on how low a sig-nal may be decimated. In our application, the transi-tion bands (–160:–40) and (40:160), relative to thecentral frequencies, may be overlapped (Figure 3).

If we consider the shape of the decimation filteronly, the Nyquist barrier can be seen as deliberatelybroken. A signal with the spectrum falling into transi-tion bands beyond the tinted area will be aliased.Fortunately, no fatal consequences result, because thesignal there is only noise and there’s no need to recon-struct the original signal. Note that the area of interest(–40:40 off the central frequency, 900 Hz) is “clean.”Nevertheless, it’s overlapped with the signals in adja-cent frequency bins and the accompanying noise, allin the stopband area of the filter. Therefore, to utilizesuch a high decimation ratio as 40, the prototype filterwill be “sufficiently good” and have an adequate shapefactor and stopband attenuation.

THE INPUTThe input into the function is a block of 40 samples(appended to the saved preceding data), which istransformed into six complex pairs. This number ofsamples corresponds to 5 ms, an interval that fits wellwith other signal processing algorithms, like ITU-T

Multirate Signal Processing

High-performance/low-cost: 700$ US

Page 14: Efficient Multirate Signal Processing

14 March 2002 Embedded Edge

G.72x vocoders. The further processing of the deci-mated signal, to estimate the instantaneous frequencyand energy in this low-noise case is obviously simple.If you’re interested in more advanced techniques,Modern Spectrum Analysis [2] provides an excellentstarting point.

For this decimation ratio (dr = 40), N = 6, the filterlength L = 100, and the sampling frequency fs = 8,000Hz; the theoretical, straightforward MAC count is2 * L * N * (fs / dr) = 0.24M MACs. The processing fig-ure for APP_filter_bank is only 0.11 MIPS (550 clockcycles per function call), including the overhead of thefunction call. That low number clearly demonstratesthe superiority of this approach over the straightfor-ward one and, of course, over traditional approaches,like Goertzel filters, which are usually slower by anorder of magnitude.

You can lower both the MIPS and memory require-ments even further by using a staged decimationprocess. For example, the MFR.1 input signal allows

simple decimation-by-2 filtering, since the highest fre-quency is sufficiently offset from 2 kHz for the filters tobe organized in a way that makes half of the coeffi-cients equal to zero. In addition, the filters in the sec-ond stage become half as long, also lowering programRAM requirements.

MANY APPLICATIONSYou can use very similar techniques for different appli-cations: almost any kind of tone detector, signal classi-fier, or spectrum analyzer and a wide variety of filterbanks, can achieve better performance with thisapproach.

But how much that performance gain benefits theend user depends on many factors. Although you canimprove the performance of an entire signal detectionand classification subsystem significantly with thesetechniques, the subsystem usually represents only asmall part of the whole picture. If an entire voice pro-cessing system is built using a so-called “universal port”

Multirate Signal Processing

Page 15: Efficient Multirate Signal Processing

Embedded Edge March 2002 15

approach—one in which a single DSP performs all thefunctions required for one or few channels—the bene-fits will be modest because the savings in requiredprocessor power are minor relative to a typicalvocoder’s requirements. If, on the other hand, the voiceprocessing is distributed—that is, if various DSPs areassigned to different functions—and each DSP executesonly a few smoothly coexisting algorithms, the differ-ence can be dramatic. There are three main reasons.

For one, such a system may be composed of the bestavailable interoperable algorithms from different ven-dors. For another, fewer DSPs will likely be needed toachieve the same channel density, since every algo-rithm may trade off between program memory andper-channel performance. Finally, there’ll likely be noneed to attach any external RAM to the DSPs becausethe DSPs won’t have to keep the entire set of algo-rithms in their program space. As an added bonus,power consumption will be reduced to just the verylow C55x core power plus serial peripherals.

Taken together, these factors will reduce the overallsystem size and price of future telecommunicationinfrastructure equipment while increasing its perfor-mance, quality, and reliability. ◆

Michael Tsiroulnikov ([email protected]) is the principal DSPengineer at MIKET DSP Solutions in Richmond, B.C. He enjoysdeveloping high-quality, high-performance customer-specificapplications on the newest DSPs, especially ones presentingsignificant R&D challenges. His expertise includes systemdesign, adaptive algorithms, and practical implementations offunctional transforms.

NOTES[1] R. G.Vaughan, N. L. Scott, and R. D.White, “The Theory ofBandpass Sampling,” IEEE Transactions in Signal Processing, vol.39, no. 9 (September 1991), pp. 1973–1985.

[2] S. Kesler, ed., Modern Spectrum Analysis, IEEE Press, NewYork, 1986.

Multirate Signal Processing

Page 16: Efficient Multirate Signal Processing

16 March 2002 Embedded Edge

Design MethodologyUncouples SoftwareArchitecturesfrom Platforms

use of the features of a general-pur-pose microprocessor. Applicationsthat are both control-intensive andalgorithmically complex, like multi-media-based products, are excellentcandidates for multiple-processorarchitectures, such as Texas Instru-ments’ Open Multimedia Applica-tions Platform (OMAP).

Developing distributed softwareapplications, however, introducesnew challenges. Software architects

must consider multiple OS selectionsand how applications will be mappedto specific processors. In addition,specialists could be required to dealwith obscure programming para-digms or very tight performance re-quirements. Such implementationdetails can invade a design to theextent that the high-level softwarearchitecture becomes blurred by thelow-level partitioning decisions.

A new software methodology

called “coordination-centric” designoffers a clean separation betweensoftware architecture and low-levelplatform issues. The methodologyautomatically produces efficientsource code from high-level softwaremodels. It supports heterogeneousdistributed architectures and allowsengineers to specify low-level imple-mentation details without referenceto the software design process.

In coordination-centric design,developers focus on the creation of arobust software architecture thatsolves high-level issues demanded bythe application, not by the hardwareenvironment. DSP specialists, there-fore, are required only to developthe algorithm(s). This approach pro-motes modularity and componentreuse and reduces the time spent inmaintaining and debugging code.

A coordination-centric softwaredesign flow starts by identifying thebasic software elements—the com-

A new methodology takes the pain out of developing

distributed heterogeneous applications.

By David McCooey, Ken Hines, and Ross Ortega

As hardware manufacturers begin providing single-

chip multiple-processor architectures, distributed

applications are becoming increasingly common.

For software developers looking to use the avail-

able processing resources efficiently, multipro-

cessing, and particularly heterogeneous, architec-

tures present good optimization opportunities. Imaging, filtering,

and other transformational algorithms, for example, run best on

digital signal processors. Control-intensive operations and graphi-

cal user interface functionality, on the other hand, make the best

Coordination-centric Design

Page 17: Efficient Multirate Signal Processing

Coordination-centric Design

ponents and special communication“coordinators”—which are createdor selected from libraries of preex-isting components. Then, the com-ponents and coordinators are graph-ically composed to create a softwaredesign.

Once it’s been created, the designis simulated to validate its functionalbehavior. After that, an architecturalmapping phase assigns the pieces tothe computing resources, that is,components to processors and coor-dinators to communication mediums.Commented C code is generatedthat’s tuned for each operating sys-tem running on a specified processorand implements the behavior cap-tured by the coordinators.

Let’s visit the mapping and codegeneration stages in more detail.

In the mapping phase, compo-nents must be assigned to processingresources—an operating systemprocess, for example. Coordinatorscan be assigned to processingresources or communication medi-ums. Figure 1 shows three possiblemappings of a simple software designto different target architectures.

The final stage of coordination-centric design, automatic code syn-thesis, translates the engineer’s high-level model into C code. The synthe-sized code provides the same func-tionality as the high-level designmodel, but important optimizationshave been made. During code syn-thesis, component and coordinatordistinctions are dissolved; compo-nents and coordinators are an arti-fact of logical design only. This is a

significant advantage of coordina-tion-centric code synthesis overother, more traditional object-orient-ed code generation approaches: byexploiting the semantic informationprovided by the coordination-centricmodel, synthesis generates codethat’s much more efficient in termsof memory footprint and run-timeperformance.

In traditional software design, adeveloper must select an appropri-ate communication mechanism, de-pending on the execution location ofthe interacting component. If thecomponents execute in the sameoperating system process, sharedmemory is an efficient mechanism.If they’re in different processes, anoperating system interprocess com-munication call is appropriate. Ifthey reside on different processors,

a remote communication mecha-nism must be selected. All of thesechoices are low-level implementa-tion decisions that complicate soft-ware architectures. In coordination-centric design, these details aredealt with before code generationand aren’t embedded in the softwaredesign.

In traditional component-based

Embedded Edge March 2002 17

Figure 1. One of the beauties of coordination-concentric design is that the software architecture remains the same with dif-

ferent implementations of the hardware. Software components map to processors, coordinators to either processors or com-

munication mediums. Here, all components and coordinators map to a single Linux process (a); two components map to

different Linux processes and the coordinator maps to a Linux interprocess call (b); one component is mapped to a Linux

process and one to a TI DSP running DSP/BIOS, and the coordinator is mapped to a wireless communications link (c).

The synthesized code provides thesame functionality as the high-leveldesign model.

Page 18: Efficient Multirate Signal Processing

Coordination-centric Design

18 March 2002 Embedded Edge

design methodologies, each compo-nent must contain not only its corefunctional behavior, but also itsbehavior interacting with othercomponents. The entanglement offunctional and interaction behaviorforces software developers to createtightly coupled software compo-nents. If the interaction behavioramong components changes, allthose components must be updatedto reflect the changes. These intru-sive modifications are required evenwhen the core functionality of thecomponent isn’t altered. What’smore, tightly coupled software com-ponents are difficult to understand,debug, and maintain and even moredifficult to reuse in different designs.

BEHIND THE BENEFITSWith a coordination-centric frame-work, however, software compo-nents are loosely coupled throughthe use of coordinators, which con-tain the interaction code betweencomponents. A design rule requires acoordinator between any interactingcomponents.

Thus a coordination-centric frame-work encourages reuse by allowingsystem components to be inter-changed in a “plug and play” fash-ion; indeed, entire subsystems can

be designed independently fromother system components. Also, thedebugging phase is greatly reducedbecause the correctness of the soft-ware functionality is separated fromits interaction behavior. Essentially,both debugging and maintainingsoftware are simplified becausethere are no “back door” interactionpaths with the components. Allcommunication and interaction is

explicit and clearly defined. A distributed MP3 player applica-

tion is a good example of how coor-dination-centric design works. Let’slook at an MP3 player we designedusing Strata, a commercial imple-mentation of coordination-centricdesign.

MP3 PLAYERThe design steps were incorporatingexisting code into components,using models to validate the behav-ior of the architecture, mapping thedesign to a single processor and syn-thesizing the code. The very laststeps were to map the same designto a distributed-processor architec-ture and synthesize the code foreach processor.

We used a PC running Linux asthe development host for Strata, fordesign entry and behavioral simula-tion of the system prior to mapping.The same PC served as the targetand to create a Linux executablethat plays MP3 songs from a file sys-tem on the PC. For the distributed-processor mapping, we used an

Figure 2. With Strata, you compose a software system by dragging and dropping

components and coordinators and making connections through bindings. The

Controller manipulates the MP3FrameReader, which in turn reads an MP3 file and

provides it to the Decoder. The rounded blue blocks are reusable software compo-

nents, the square green blocks the coordinators.

Figure 3. An action triple model allows software contained in action trapezoids

to be executed upon the occurrence of certain trigger events, provided that the

related mode of operation is true. For instance, within the Controller, when the

Play mode is active and the Check event occurs, the code snippet inside

playing_check is executed.

Page 19: Efficient Multirate Signal Processing

Coordination-centric Design

Embedded Edge March 2002 19

Ethernet crossover cable that con-nected the PC to a TexasInstruments TMS320C6711 DSPboard. The distributed mapping ofthe player placed the control andfile reader portion on the PC and thedecoder and playback on the DSPboard. We loaded the synthesizedcode for the DSP board using TI’sCode Composer Studio IDE.

The first task was to partition theplayer functionality into coordina-tion-centric components and coor-dinators. Because of the streamingnature of the application, the coor-dinators required were simple datapipes. (Figure 2 shows a Strata toolview of the software design for theMP3 player.) To demonstrate themain decoding algorithm, we wrotea simple C-based control GUI tostart, pause, and stop the song.

Pushing into the Controller compo-nent, shown in Figure 3, highlightssome interesting features of thecoordination-centric design frame-work. The graphical syntax capturesthe event-driven programmingmodel, which uses modes, events,and actions. Modes are internalstate variables that guard the action.Actions can turn modes on or off,generate events, modify variables,and make foreign subroutine calls .Events are generated either auto-matically by the system or explicitlyby an action. The combination of amode, an event, and an action con-stitutes an “action triple.” Forexample, when the Play mode is trueand the Check event arrives, theaction playing_check is triggered.

USING LEGACY CODEA foreign subroutine is one writtenin C or Java. A major advantage ofcoordination-centric design is thatsuch legacy code can be intermin-gled with new software components.The playing_check action code, below,communicates with the GUI con-troller via the MP3_Control foreign

call, which returns the identity ofthe button pressed. After activatingthe corresponding mode, the actiongenerates the Check event.

foreign function int Mp3_Control ();int result;result = Mp3_Control ();If ((result == 1)) {

+Play;} else {

if ((result == 2)) {+Pause;

} else {if ((result == 3)) {

+Stop;}

}}->Check ();

The rectangular box named XOR3

in Figure 3 is a user-defined con-straint block. It enforces a mutualexclusion constraint between themodes Play, Pause, and Stop, which areconnected to the inputs m1, m2, andm3, respectively. Constraints clearlydeclare and enforce the intendeduse of a component. In standardprograms, in contrast, constraintsare usually embedded into thestructure of the code or containedin a comment to remind the devel-oper and others about the intendeduse of a body of code. Also, note thatlike components and coordinators,constraints are reusable elements ofa software design.

Coordination-centric design pro-vides a very powerful abstraction inthe ability to share states amongcomponents without committing toa communication mechanism. Com-

Figure 4. Strata provides a visual tool, the evolution diagram, that lets you fol-

low the state, control flow, and data flow of a software architecture, allowing

time-saving debugging long before the mapping to actual hardware. Here the

Decoder and MP3FrameReader exchange messages (blue arrows) via the

Data_Pipe. At index 2, an event in the Controller requests that the Play mode

become active (red arrow). The Play mode is activated at index 4, and this infor-

mation is transmitted to the MP3FrameReader, whose Play mode also is activated.

Page 20: Efficient Multirate Signal Processing

20 March 2002 Embedded Edge

ponent and coordinator interactionsoccur via interfaces. Figure 3 showsthe Controller’s graphical connectionto its Control_Out interface.Graphically the modes Play, Pause,and Stop are bound to this interfaceand therefore shared with theMP3Frame_Reader via the coordinatorctrl_pipe, as shown in Figure 2.

INDEPENDENT SIMULATIONTo validate the functional behaviorof the design, we first simulated itindependent of operating system andhardware issues. By debugging thedesign at this level of abstraction, wegained confidence in the algorithm’scorrectness. In a traditional debug-ging environment, an error could bein the algorithm; it could be an

improper use of the operating sys-tem or a communications protocol;it could be a hardware bug or a tim-ing problem; and so on.

The developer must consider allthe possibilities simultaneously. Toease the task, the coordination-cen-tric framework provides a visualiza-tion called “evolution diagrams”that graphically illustrate interac-tions among the components andcoordinators, as shown in Figure 4.(Note that this simulation isn’t areal-time execution of the system.)Each horizontal trace is a compo-nent or coordinator. Within a com-ponent or coordinator, colored hori-zontal bars represent the compo-nent’s or coordinator’s mode. Ver-tical bars are events. Colored arrows

among the traces show control com-munications, message sends, andvariable propagation. Of particularinterest are the control and variablearrows, which show interactionbehavior that wasn’t hand-coded bythe developer but instead generatedautomatically.

The sound quality demonstratedby the model simulation proved thatthe architecture was functioningproperly.

TARGET MAPPINGConfident in the correctness of thearchitecture, we created a LinuxMP3 player by mapping the designto a single Linux process. Sincecoordinators are logical constructs,the ones in Figure 2 were optimized

Coordination-centric Design

Page 21: Efficient Multirate Signal Processing

Ported from architecture to architecture, typical

DSP operating systems never reach their full

potential on any particular processor. But now, the

engineers at BlackhawkTM have painstakingly craft-

ed an individually perfect DSP RTOS for each mod-

ern TI DSP architecture.

Each BlackhawkTM TAILWINDTM DSP reference

board has its own TALONTM RTOS written specifi-

cally for it. Eliminating the compromises of ported

OS’s; enabling the highest possible performance

and the quickest out-of-the-box bring-up.

BlackhawkTM provides a full spectrum of DSP

products, tools, and services that bring your DSP

designs to market faster, at less cost. Before you

start your

next TI DSP

p r o j e c t ,

w ing your

way over to

www.blackhawk-dsp.com/handcrafted.

BlackhawkTM is the perfect match for your DSP

development project.

001.877.983.4514www.blackhawk-dsp.com/handcrafted© 2002 EWA Technologies, Inc. All Rights Reserved.

TI DSP REFERENCE BOARDS AND MATCHED RTOS’S

C2000

C5000

C6000

H A N D C R A F T E D

Page 22: Efficient Multirate Signal Processing

22 March 2002 Embedded Edge

away during code generation, in thiscase because all interaction occursin the shared address space of aLinux process. Code generation cre-ated commented C code and a cor-responding make file. The code wasthen compiled with gcc to create thebinary image. Because the simula-tion foreign subroutines were writ-ten for Linux, we reused them forthe target. The resulting MP3 playerexecutable played in real time.

As described earlier, we then dis-tributed the design across a LinuxPC and a C6711 DSP evaluationboard connected by an Ethernetcrossover cable. The distributedMP3 player plays MP3 songs byretrieving them from the Linux side,transmitting them to the DSP board,decoding them, and playing themthrough an external speaker.

Figure 5 shows the mapping tablefor the distributed player. Wemapped the decoder component tothe DSP board, the connecting coor-dinator to the TCP link, and theremaining control and file aspects tothe Linux computer. Code genera-tion created two sets of C files, onefor the Linux side and one for theDSP/BIOS side, as well as TCP callsfor Linux and for DSP/BIOS. UsingCode Composer Studio, we com-piled the C files for the DSP boardand downloaded the binary imageonto the board. We compiled theLinux side with gcc, then ran bothexecutables and listened to an MP3-encoded song.

RESULTSThe memory requirements for thedistributed MP3 player synthesizedsoftware are as follows: for the LinuxMP3 player, the Linux footprint was233 kilobytes; for the distributedMP3 player, it was 119 KB, and theDSP footprint was 805 KB.

The large DSP footprint needs fur-ther explanation. Because we hadessentially a “raw” board, the entire

software image had to be down-loaded. The TCP/IP stack providedwith the development kit—down-loaded to drive the Ethernet connec-tion—is 301 KB (37% of the DSPimage). Other DSP/BIOS librariesand objects required an additional68 KB (8%). The DSP architecturerequires the alignment of variousdata structures; it enforces the align-ments by introducing “holes,” orgaps, in memory, which took up 285KB (35%). We had to write initializa-tion code to bring up the TCP/IPstack, which required 20 KB (2%).The legacy code, including the MP3decoder algorithm itself, required 96KB (12%), and the generated synthe-sized code 35 KB (4%). ◆

David McCooey ([email protected]) is a senior software engineer atConsystant Design Technologies, Inc.,focusing on platform differences and howto capture them in APIs. For the previous12 years, he worked on the internals ofthe LynxOS RTOS at Lynx (later tobecome LynuxWorks). Before that, hewas a member of the technical staff atAT&T Bell Labs, working on the AccessNetwork System (ANS). Ken Hines([email protected]) is Consyst-ant’s chief scientist, a vice president, andco-chairman of the board; Ross Ortega([email protected]) is chieftechnology officer, a vice president, andco-chairman of the board. Hines andOrtega founded Consystant based ontheir doctoral research.

Figure 5. A software architecture (bottom left) is mapped to a hardware architec-

ture (bottom right) via a mapping table (top).To repartition the behavioral ele-

ments, you simply check the box associated with the computational resource

available on the target. Here, the Decoder and audio_pipe are mapped to the DSP,

the Data_Pipe is mapped to a TCP link, and the rest of the components are

mapped to Linux.

Coordination-centric Design

Page 23: Efficient Multirate Signal Processing
Page 24: Efficient Multirate Signal Processing

Tone Relay

24 March 2002 Embedded Edge

Passing Tonesin Voice-over-PacketData Systems

Low-rate speech compressionalgorithms for voice-over-packet

data networks can often distort sig-naling tones excessively. Tonerelays and tone relay algorithmsespecially developed for voice-over-packet and other systems thatemploy speech compression enableyou to avoid the problem. Thesealgorithms address a number of sub-tle issues that affect robustness andperformance. (Even when usingone, however, you need to do a sig-nificant amount of testing under avariety of conditions to ensurerobustness.) A good one should beflexible enough to handle the rightset of signaling tones yet simpleenough to integrate into a hostapplication.

The idea of carrying speech overpacket data networks is gainingacceptance in the world of telepho-ny. Many standards bodies havescrambled to determine the bestway to do that over the varioustypes of packet data networks, like

IP, ATM, and frame relay. All packetnetworks have limited bandwidth;for that reason, speech compressionis an essential ingredient of voice-over-packet standards, in order tomake the best use of the channels.

Speech compression algorithmsremove redundancy from speechdata by extracting key informationfrom speech signals. This informa-tion often includes parameters thatmodel the human vocal tract. A

A tone relay can eliminate troublesome signaling tone

distortion and other problems in packet data networks

that employ speech compression.

By Scott Kurtz

Figure 1. A communications system can employ a tone relay working in parallel

with its speech coder. When tones are detected, the tone information is sent over

the same communications network as the encoded speech data, sometimes

replacing the speech data. At the decoding end, the tone relay decoder regener-

ates the tones, and the output replaces the decoded speech.

Page 25: Efficient Multirate Signal Processing

Tone Relay

good model requires only a few bitsto specify the parameters while stillproviding good speech quality whenthe speech signal is regenerated. Ingeneral, a higher degree of compres-sion (lower bit rate) results in lowerspeech quality.

Unlike Morse and Huffman cod-ing, speech compression algorithmsresult in the loss of information.Although good algorithms holddown the loss in perceived speechquality, the lower-rate algorithmsaren’t adept at passing many non-speech signals, including DTMF, MFR1, MF R2 Forward and Reverse,and Call Progress tones. In fact,many lower-rate speech compres-sion algorithms distort signalingtones beyond the point of reliabledetection.

SYSTEM OVERVIEWA tone relay implementation con-sists of two layers of functionality:the relay encoding and decodingand the underlying tone detectionand generation, as shown in Figure1. The tone relay has a componentthat operates at the encoding side ofthe link, the tone relay encoder, andone that operates at the decodingside, the tone relay decoder. In thefigure, the input is a sequence of asample representing the voicebandsignal. Normally, the sampling rateis 8,000 samples per second.

The input samples feed both thespeech encoder and the tone relayencoder. The speech encoder com-presses the speech data, and thetone relay encoder detects the pres-ence or absence of a signaling tone.If a tone is present, the tone relayencoder encodes information aboutthe tone that enables it to be repro-duced at the other end of the com-munications link. The tone relayencoder also issues an Active flagthat, when set, indicates that validtone data is present.

The speech data, tone data, and

Active flag form the inputs to theencoded-packet processor, whichcreates a speech or tone packet inaccordance with the appropriatevoice-over-packet specification. Thepacket then goes to the opposite endof the link via the packet network.

The packet-decoding processorreceives packets from the packetnetwork and sends compressedspeech data to the speech decoder.In some systems, speech data isn’tsent when tone data is sent. If that’sthe case, the processor informs thespeech decoder that the frame ismissing and the speech decoder actsaccordingly. The processor alsosends the tone data and the Activeflag to the tone relay decoder. If toneactivity is present, the tone relaydecoder regenerates the originaltone based on the parametersincluded in the packet.

A switch determines whether thedecoded speech or the regenerated

tone data serves as the output. Thetone relay decoder controls theswitch, which is set to the speechdecoder unless a tone is beingregenerated.

TONE PASSER CHARACTERISTICSOn the surface, the design of a tonepasser might appear straightfor-ward. There are, however, someimplementation details, such asframe size, that can have a signifi-cant effect on its performance.

Speech compression algorithmsoperate on a frame-by-frame basis.The input speech is therefore divid-ed into frames. Each frame containsa given number of samples, asdefined by the algorithm. Typicalframe lengths and their correspond-ing sample counts are 2.5 ms (20samples), 10 ms (80 samples), and30 ms (240 samples).

To reconstruct signaling-tone

Embedded Edge March 2002 25

Figure 2. Too long a segment of tone passing through the vocoder can lead to

adverse effects. Here, for an input sequence of two 50-ms tone pulses (a), the

vocoder delays and distorts the signal (b). The duration of the pulses has

increased in the decoded output, reducing the interdigit time, and the initial part

of the pulses is still distorted (c). If the interdigit time is not met, the detector at

the far end can make errors. Using leading-edge suppression eliminates the dis-

tortion and maintains the interdigit time (d).

Page 26: Efficient Multirate Signal Processing

Tone Relay

26 March 2002 Embedded Edge

bursts more precisely, a tone relayshould use finely specified dura-tions. Consequently, the tone relayencoder must be able to analyze theinput signal using small frame sizesso that the burst length is quantized

in sufficiently small intervals.Although a smaller interval allows

a more accurately reconstructedtone, more processing power is need-ed to detect the smaller intervals. Atrade-off is therefore necessary.

A good tone relay should generatetone bursts with an accuracy of ±3ms. The tone burst interval may notcoincide with the frame size of thevocoder or with the analysis framesize of the tone detector in the tonerelay encoder. Accordingly, the tonerelay algorithm must work out thosedifferences.

Amplitude quantization is anoth-er important detail. Note that thetone relay quantizes both the timeinterval and the signal amplitude.Tone detectors in the network must

detect signals over a wide range ofamplitudes. Nevertheless, a tonerelay should be designed so that theamplitude of the regenerated signalremains close to that of the originalsignal. The telephone network al-

ready contains losses, and the tonerelay shouldn’t contribute anymore. If a network is already nearthe upper limit of signal loss, thetone relay could push it over the topif the regenerated amplitude isn’tsufficiently accurate.

Once again, there’s a trade-off. Asthe number of amplitude quantiza-tion levels increases, the bandwidthrequired to pass the quantized levelsincreases. A good tone relay should

be able to quantize the signal ampli-tude with quantization intervals of 3dB or less.

The dynamic range of the ampli-tude quantizer should span therange called for by the associatedsignal detector specification. Forexample, if the detector specifica-tion indicates that a detector mustdetect signals between 0 and –25dBm, the tone relay amplitudequantizer should span that range.

Leading-edge suppression isanother parameter to worry about.To detect a signaling tone reliablywithout excessive probability of afalse alarm, many samples of inputdata must be analyzed before decid-ing whether a signaling tone is pre-sent. The tone relay encoder outputis therefore delayed with respect tothe onset of the signaling tone. In asystem that employs tone relay, theleading edge of the signaling tonecould possibly pass through thevocoder before the tone relaydetects the tone.

If a long enough segment of tonepasses through the vocoder, twoadverse effects may result, as shownin Figure 2. Here, a sequence of two50-ms tone pulses are the input tothe encoding side of a communica-tion link (Figure 2a). The vocoderintroduces both delay and distor-tion. (Figure 2b).

The first effect is that the toneburst that emerges from the decod-ing side is extended in duration, and

distortion remains at the start of thetone bursts (Figure 2c). The begin-ning of the output signal is the por-tion passed by the vocoder, and thetone relay decoder generates the

A good tone relay should be able toquantize the signal amplification withquantization intervals of 3 dB or less.

Figure 3. The tone relay encoder function includes tone detectors, which supply

status information the encoder uses to produce coded tone data. Similarly, the

tone relay decoder function includes a tone generator. The tone generator uses

control information from the relay decoder to re-create the tone.

Page 27: Efficient Multirate Signal Processing

Tone Relay

Embedded Edge March 2002 27

remainder. As a result, the interdigittime is shortened when signalingtones occur in rapid succession. Ifthe interdigit time isn’t met, thedetector at the far end can makeerrors. Leading-edge suppressioneliminates the effect, as well as thedistortion (Figure 2d).

The second effect is similar to thefirst. Since it’s presumed that thevocoder distorts the signaling tone,the portion that it passes is distort-ed, but the portion that the tonerelay passes is not. However, a phasediscontinuity occurs between thetwo portions of the output signal.The discontinuity could cause thedetector at the far end to detect asplit digit or two separate digits for asingle input digit.

Consequently, a tone relay shouldbe capable of detecting the leadingedge of a signaling tone as soon aspossible. When the leading edge isdetected, a tone suppressor shouldbe used to remove the signalingtone’s frequency components fromthe signal going to the vocoder.

Tone suppression could be donemore crudely by muting the input tothe speech encoder. However,although muting is a simpler solu-tion, it isn’t desirable. The leading-edge detector isn’t as robust as thedetector itself because it must makea decision on a shorter-duration sig-nal. The leading-edge detector istherefore more prone to falsealarms. Muting the input to thespeech encoder when the leading-

edge detector goes off makes it morelikely that speech will be muted,rather than the signaling tone. Soit’s better to suppress only the fre-quency components that are part ofthe detected signaling tone. Thatway, if speech is present, the speechsignal isn’t changed too much.

FROM THEORY TO PRACTICENow let’s take a closer look at a tonerelay implementation. Figure 3breaks down the architecture shownin Figure 1.

The input PCM signal is fed intothe tone relay encoder, which inturn feeds it to the various types oftone detectors. The tone detectorsreturn status information including

Page 28: Efficient Multirate Signal Processing

Tone Relay

28 March 2002 Embedded Edge

tone presence, early tone detectionstatus, and signal amplitude. Thetone relay encoder monitors thedetector status information andsends coded tone data to the com-munications channel. The codedtone data includes information indi-cating which tone (if any) is present,the level of the tone, and the timeelapsed since the start of the tone.The relay encoder also produces asuppressed version of the inputPCM signal for use by the speechencoder.

The coded tone data is fed to therelay decoder. The decoder uses thedata to control the tone generatorwith a Tone Generator Control sig-nal, which contains frequency,amplitude, and pulse duration infor-

mation. The tone generator synthe-sizes the tone and sends it back tothe relay decoder.

A carrier-class DSP-based tonerelay software package can demon-strate how tone relay functionalitycan be integrated into a host appli-cation. For the discussion of such

software, go to http://edtn.com/cs/EE/tonerelaysw.fhtml ◆

Scott Kurtz ([email protected]) is thevice president of engineering at AdaptiveDigital Technologies, Inc. in Consho-hocken, Pa., where he has focused on thedevelopment of the company’s DSPbased products and technology. He hasspecialized in digital signal processing anddigital communications over the courseof his 18-year career, beginning at RCACorporation’s Government Commun-ications Systems Division. At InterDigitalCommunications Corporation, he wasinstrumental in developing the first digitalwireless local loop telephone system,which was the precursor to today’s digitalcellular telephone systems. He holds sixpatents, with two more pending.

A tone relay shouldbe capable ofdetecting the lead-ing edge of a toneas soon as possible.

Page 29: Efficient Multirate Signal Processing

Launchings

Embedded Edge March 2002 29

Analog-I/O DSP Board Stars16 Independent ChannelsToro, a PCI-based analog-I/O DSPboard powered by a 150-MHzTMS320C6711 DSP, features 16simultaneous, independent chan-

nels, plus 32MB of one-

wait-state SD-RAM, and a wide

choice of trig-gering modes.Each channelwields dedi-

cated 16-bit, 250-kHz A/D and D/Aconverters, allowing for simultane-ous, nonmultiplexed sampling. Abuilt-in real-time event log preciselyrecords trigger times and user-defined events. Multiple Toro boardscan share triggers for additionalchannel capacity. Toro sells for$2,850. An 8-channel version costs$1,850. Innovative Integration,Inc., Simi Valley, Calif.; (805) 520-3300, www.innovative-dsp.com

Telecom DevelopmentPlatform Pushes 2,400 MIPSThe TIGER 620x, a PCI-based devel-opment platform for TMS320C6202,C6203, and TMS320C6204 DSPs,allows processing throughputs of upto 2,400 MIPS and can run in a PC oras a stand-alone system. Suited fordeveloping sophisticated telecom

systems, it carries a 32-bit PCI inter-face, 16 MB of SDRAM, and 4 MB offlash memory and accepts H.100 busconnections and XBUS and EMIFdaughtercards. It also includes astereo audio and phone interface.Prices start at $3,350. DSPResearch, Inc., Sunnyvale, Calif.;(408) 773-1042, www.dspr.com

Telecom Software SuitesAdd eXpressDSP-CompliantPackages A line echo canceller for theTMS320C54x generation and anadaptive multirate (AMR) encoder-decoder for the TMS320C55x gener-ation are the first in a series ofeXpressDSP-compliant additions tothe HelloVoice, HelloWireless, andHelloWLAN software suites. Forrelease later this yearare both line andacoustic echocanceller, G.729,A+B encoder-de-coder, GPRS PHYlayer, and DTMFpackages for theC55x DSP, as well as an 802.11x PHYlayer for the C64x DSP. License feesrange from $75,000 to $125,000 forthe encoder-decoder and from$50,000 to $150,000 for the echocanceller. Royalties are negotiable.HelloSoft, Inc., Campbell, Calif.;408) 377-0110, www.hellosoft.com

Modem and TelephonyDevelopment KitsThe Modem Developers Kit andClient-side Telephony DevelopersKit for the TMS320C54V90 andTMS320C54CST DSPs include anevaluation module with a 120-MHzDSP, 256 kB of RAM, DAA/line-inter-face circuitry, 1Mb-x-16 flash, sixLEDs, and user-definable switches.Each also has variety of standardinterfaces, as well as a JTAG headerto connect the kit to an emulator oruse it with Code Composer Studio.Power supply, cables, and documen-tation and software on a CD-ROMare included as well. Both kits,which run on Windows 95, 98, NT4.0, and 2000, are distributed in theU.K. by Kane Computing. Each kitsells for $495. Spectrum Digital,Inc., Stafford, Texas; (281) 494-4500, www.spectrumdigital.com;

Kane Computing Ltd., Northwich,U.K.; +44-01606-351006, www.kanecomputing.com

Tone Detector–GeneratorSkirts IncompatibilitiesA universal multifrequency tonedetector and generator (UMTD/UMTG) for CPTD, DTMF, MF-R1,and MF-R2, as well as some fax andCID tones, is available for TMS-320C54x DSPs. The software modulecomplies with eXpressDSP Real-Time Software Technology, as well asthe signaling standards of Europe,Asia, andthe U.S.,t h e r e b yovercom-ing the various incompatibilities.The detector portion also can serveas a simple spectrum analyzer. Aworldwide nonexclusive object-codelicense costs $7,999. SPIRIT Corp.,Moscow; +7 (095) 911-7654, www.spiritcorp.com

Protocol Stack BoastseXpressDSP ComplianceA serial communications protocolalgorithm, bf3Scp helps to designWindows user interfaces for DSPapplications. The architecture isbased on an ActiveX host protocolengine and an eXpressDSP-complianttarget-side protocol engine algorithm.It’s available for the TMS320C54xgeneration and TMS320C6000 DSPplatform. A second package, bf3Net,is a TCP/IP protocol stack for theC54x that connects DSP algorithmsto the Internet. Its modular architec-ture allows for customization; CodeComposer Studio plug-ins speed con-figuration, code generation, anddebugging. License fees are $7,500 forbf3Scp and $20,000 for bf3Net.Windmill Innovations, Nijkerk, TheNetherlands; +31-33-246-5314, www.windmill-innovations.com

Page 30: Efficient Multirate Signal Processing

30 March 2002 Embedded Edge

In recent years, the way that DSPsoftware applications are devel-

oped has undergone a major trans-formation. With the number of func-tions performed by a DSP exploding,writing the majority of code inhand-optimized assembly languagehas become impractical, and mostDSP code is now written in C.As applications continueto grow in complexity,developers are turn-ing increasingly toreal-time operat-ing systems aswell. Indeed, DSPRTOSs are com-ing of age, butmore is needed forcomplete solutions,in particular, devicedrivers, frameworks,and algorithms.

Three years ago, few DSPsystem developers used commercialRTOSs. Since then, their use hasgrown so rapidly that, for example,over half of Texas Instruments’ cus-tomers now use a commercial RTOS.

Two trends are driving the adop-tion of DSP RTOSs. In some applica-tions, DSPs perform all system con-trol functions in addition to signalprocessing and so need an RTOS.Cost-sensitive consumer electronicapplications, such as MP3 players,where eliminating the microcon-troller reduces the overall systemcost, are good examples.

Most DSP applications, however,use one or more DSPs communicat-ing with a microprocessor. In those

cases, the DSP executes multiplesignal processing functions, likespeech compression and dial tonedetection. For maintainable multi-functional systems, a multithread-ing DSP RTOS is ideal.

Given the size and performanceconstraints of most DSP applica-

tions, most developers todayprefer to use a real-time

DSP kernel ratherthan more heavy-

weight RTOS ker-nels typical ofmicroprocessora p p l i c a t i o n s .Obviously, simplyporting a standardmicroprocessor or

microcontrol lerkernel to a DSP is

inappropriate for mostapplications. Unlike their

cousins, DSP kernels mustscale to provide basic multithreadingsupport in just a few kilobytes. Inaddition, interrupt latency and inter-rupt-to-task time are critical, sinceunlike microprocessors, DSPsalmost always process streams ofreal-time data.

Now, the market is starting todemand more complete solutions. Tomeet those demands, DSP RTOSsneed to evolve beyond simple kernelsto address three key areas—devicedrivers, frameworks, and algorithms.

Device drivers are notoriously dif-ficult to debug and test. Moreover,no DSP application can be seriouslytested on the embedded target with-out working drivers. Working drivers

for a DSP’s on-chip peripherals andcommonly used external peripher-als enable developers to bring up anapplication on real hardware muchfaster. They also eliminate the needfor a significant effort to developsoftware that doesn’t differentiatethe final product.

Frameworks are particularly criti-cal for developing robust DSP appli-cations quickly. They’re the glue thatintegrates algorithms, I/O, and con-trol code and are ultimately respon-sible for managing system resourcesefficiently. By nature, frameworksare application-specific. SuccessfulDSP RTOSs must therefore offer arange of frameworks optimized fordifferent types of applications.

Frameworks must also providestandardized interfaces for integrat-ing proprietary as well as third-partyalgorithms. With DSP applicationsnow supporting multiple functions,more development teams are buyingalgorithms from third-party pro-viders. By simplifying the process ofintegrating algorithms from multiplesources, frameworks can helpreduce the system integration time,which already consumes much ofthe product development cycle.

DSP RTOSs Come of Age,But Developers Need More

Nick Lethaby is pro-duct manager for theDSP/BIOS real-timekernel at Texas Instru-ments, Inc.’s SoftwareDevelopment Sys-tem group in SantaBarbara, Calif.

On the Edge

By Nick Lethaby

Page 31: Efficient Multirate Signal Processing

PCI EmulatorLow VoltageMult iprocessor

Features of the DspIceLVx include:

AND/OR

High-level language debugging support

Code Composer Studio, RTDX, DspBios etc., Support

Code Composer Studio PDM (Parallel Debug Manager) Support

TI DSPs and ARM support (simultaneously)

Variable TCLK from 20Mhz to 158Hz

1.0V - 5.0V processors (auto-detect)

Supports 1 processor JTAG scan chain as per XDS510

with multiple processors in a “daisy chain”

CCS support for up to 16 separate JTAG scan chains

(as per XDS510)

Heterogenous (mixed) and Homogenous processor support

Rugged compact design

PCI bus PnP support

Supports up to 16 separate JTAG scan chains (not daisy chain)

With multiple processors in a “daisy chain”

All from one PCI slot

Software Support

Processor Support

!tm

TI Code Composer Studio CCS1.2 CCS2.0tm tm tm tm tm Texas Instruments trademarks XDS510 TMS320 , TMS470 , RTDX , DSPBios , Code Composer Studio

tm tmARM Limited trademarks ARM7 , ARM9

tm,

Softronics Emulators

! xICE* -> PCI Bus Multi-Target Emulator! nICE -> 100baseT Ethernet Emulator

! Ice*Pack -> ISA (16 bit) Bus Emulator ! MIce*Pack -> Parallel Port Emulator

! DspIce* -> PCI Bus Emulator

S o f t r o n i c ss a l e s @ s o f t r o n x . c o m

w w w . s o f t r o n x . c o m

P h + 6 1 5 0 0 5 0 5 0 5 9F x + 6 1 5 0 0 5 0 5 0 4 9

Dsp IceLVx

JTAG TMS470 TMS320 DSP Family TI_14Pin OMAP C20x C33 C54xx C62xx

4 4 4 4 4 4

4 4 4 4 4 4

ARM7

4

4

C24x C55xx C67xx

4 4 4

DspIceLV 4 4 4

C30-C32

MPSD_12PIN

C28xx

4

4

DspIce(mpsd) 4 4

ARM9 C27xx C64xx

4 4 4

4 4 4

DspIceLVx

Page 32: Efficient Multirate Signal Processing
Page 33: Efficient Multirate Signal Processing

IMPORTANT NOTICE

Texas Instruments Incorporated and its subsidiaries (TI) reserve the right to make corrections, modifications,enhancements, improvements, and other changes to its products and services at any time and to discontinueany product or service without notice. Customers should obtain the latest relevant information before placingorders and should verify that such information is current and complete. All products are sold subject to TI’s termsand conditions of sale supplied at the time of order acknowledgment.

TI warrants performance of its hardware products to the specifications applicable at the time of sale inaccordance with TI’s standard warranty. Testing and other quality control techniques are used to the extent TIdeems necessary to support this warranty. Except where mandated by government requirements, testing of allparameters of each product is not necessarily performed.

TI assumes no liability for applications assistance or customer product design. Customers are responsible fortheir products and applications using TI components. To minimize the risks associated with customer productsand applications, customers should provide adequate design and operating safeguards.

TI does not warrant or represent that any license, either express or implied, is granted under any TI patent right,copyright, mask work right, or other TI intellectual property right relating to any combination, machine, or processin which TI products or services are used. Information published by TI regarding third–party products or servicesdoes not constitute a license from TI to use such products or services or a warranty or endorsement thereof.Use of such information may require a license from a third party under the patents or other intellectual propertyof the third party, or a license from TI under the patents or other intellectual property of TI.

Reproduction of information in TI data books or data sheets is permissible only if reproduction is withoutalteration and is accompanied by all associated warranties, conditions, limitations, and notices. Reproductionof this information with alteration is an unfair and deceptive business practice. TI is not responsible or liable forsuch altered documentation.

Resale of TI products or services with statements different from or beyond the parameters stated by TI for thatproduct or service voids all express and any implied warranties for the associated TI product or service andis an unfair and deceptive business practice. TI is not responsible or liable for any such statements.

Mailing Address:

Texas InstrumentsPost Office Box 655303Dallas, Texas 75265

Copyright 2002, Texas Instruments Incorporated