ece 320 - spring 2003 - cnx.org

148
ECE 320 - Spring 2003 Collection Editor: Douglas L. Jones

Upload: others

Post on 23-Apr-2022

8 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: ECE 320 - Spring 2003 - cnx.org

ECE 320 - Spring 2003

Collection Editor:Douglas L. Jones

Page 2: ECE 320 - Spring 2003 - cnx.org
Page 3: ECE 320 - Spring 2003 - cnx.org

ECE 320 - Spring 2003

Collection Editor:Douglas L. Jones

Authors:

Swaroop AppadwedulaMatthew Berry

Mark ButalaMark Haun

Jake JanovetzDouglas L. Jones

Michael KramerJason Laska

Dima MoussaDaniel SachsBrian Wade

Online:< http://cnx.org/content/col10096/1.2/ >

C O N N E X I O N S

Rice University, Houston, Texas

Page 4: ECE 320 - Spring 2003 - cnx.org

This selection and arrangement of content as a collection is copyrighted by Douglas L. Jones. It is licensed under the

Creative Commons Attribution 1.0 license (http://creativecommons.org/licenses/by/1.0).

Collection structure revised: January 22, 2004

PDF generated: October 25, 2012

For copyright and attribution information for the modules contained in this collection, see p. 135.

Page 5: ECE 320 - Spring 2003 - cnx.org

Table of Contents

Preface for U of I DSP Laboratory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

ECE 320 Course Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

Announcements . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

1 Required Labs

1.1 Lab 0 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 141.2 Lab 1 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221.3 Lab 2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 311.4 Lab 3 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 381.5 Lab 4 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 471.6 Lab 5 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58

2 Project Labs

2.1 Digital Receiver . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 692.2 Audio E�ects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 842.3 Surround Sound . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 872.4 Adaptive Filtering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 932.5 Speech Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 952.6 Video Processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100

3 General References3.1 Processor . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . 1073.2 Core File . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1143.3 Code Composer . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 128

Bibliography . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130Index . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 132Attributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .135

Page 6: ECE 320 - Spring 2003 - cnx.org

iv

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 7: ECE 320 - Spring 2003 - cnx.org

Preface for U of I DSP Laboratory1

This text builds on over fourteen years of DSP laboratory instruction and over ten years of collaborativedevelopment of instructional laboratory materials. The content has evolved in tandem with ECE 320: DigitalSignal Processing Laboratory, a senior-level, two-credit-hour elective laboratory course at the Universityof Illinois at Urbana-Champaign, and to a large extent re�ects its goals and structure. The material isnonetheless well suited for a variety of course organizations, and earlier versions of the material have beenused with success at the University of Washington and elsewhere.

This text could be e�ectively used with several types of course structures, including

• a semester-long project-oriented DSP laboratory,• a quarter- or semester-long DSP laboratory structured around weekly laboratory exercises,• a hands-on laboratory supplement as part of a signal processing theory course,• a self-study course in DSP implementation.

ECE 320 at the University of Illinois represents the �rst type of course. It consists of roughly two equal parts:a series of weekly laboratory assignments, including introduction to the Texas Instruments TMS320C549microprocessor and DSP development environment, real-time FIR, IIR, and multirate �ltering, spectralanalysis using the FFT, and a digital communications transmitter. Students work together in pairs onthese laboratory assignments and are orally quizzed individually after completing each weekly laboratoryassignment. The materials for each week are a semi-self-paced tutorial with three major parts: a reviewof the signal processing concepts, a design or familiarization exercise (often MATLAB-based), and a real-time implementation assignment using the TMS320C549 microprocessor. After completion of these commonmodules in mid-semester, student teams conceive of a substantial real-time DSP project of their choice andspend the remainder of the semester designing, simulating, implementing, and testing it. Supplementarymodules introducing students to the basics of digital communication (including phase-locked loops and delay-locked loops), adaptive �ltering, speech processing, and audio signal processing accelerate students' progresson projects in these areas.

A course emphasizing signal processing algorithms might forgo a major project and instead use thesupplementary modules to complete a quarter or semester of weekly laboratory assignments. A one-hourhands-on laboratory supplement to a signal processing lecture course could stretch the �rst few units (e.g.,through spectral analysis) over a semester, thereby reinforcing and enhancing students' understanding ofthe core signal processing theory and algorithms. Due to the self-paced, tutorial nature of the materials, astudent can independently learn the aspects of real-time DSP implementation that interest them; studentsin our senior independent design course at the University of Illinois have successfully used the materials inthis manner.

The laboratory materials and assignments re�ect our belief that a thorough instruction in signal pro-cessing implementation requires exposure to assembly-language programming of �xed-point DSP micropro-cessors, as this represents an important component of current and at least near-future industrial practice.Instructors with other goals or perspectives may �nd most of the tutorial, design material, and assignmentsrelevant even if they choose compiler-based or non-real-time implementation. Laboratories using di�erentdevelopment systems or di�erent DSP microprocessors will likely �nd almost all of the material well suited

1This content is available online at <http://cnx.org/content/m10681/2.12/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

1

Page 8: ECE 320 - Spring 2003 - cnx.org

2

for their needs; only the hardware-speci�c language and instructions need be modi�ed. Earlier versions ofthis material have been used with several di�erent DSP microprocessors and development boards based onthe Motorola DSP56000 and the Texas Instruments TMS320 families.

Connexions is an ideal venue for this text for several reasons. DSP hardware and development tools areevolving very rapidly, so a textbook produced through conventional publishers is likely to be almost obsoletebefore it is printed. Every university has a unique set of equipment, curriculum, and students, necessitatingsite-speci�c specialization of laboratory instructional material; conventional publishing is unable to producetextbooks cost-e�ectively with the rapid turnaround and low volumes thus required. We have always madeour materials open, available, and free to other institutions to use in their own laboratory course development,so the open-source spirit of the Connexions project re�ects our own philosophy and should more easily enableothers to build on our experience. Finally, this material was created, modi�ed, rewritten, and enhanced bya large and changing group of authors over a period of years in response to new ideas and evolving needs,goals, and equipment; its development thus embodies the Connexions philosophy.

The development of these materials would not have been possible without the active support and encour-agement of many people and organizations. First, we express our gratitude to the corporations, particularlyTexas Instruments, Motorola, and Hewlett-Packard/Agilent, whose generosity has equipped our instructionallaboratory with state-of-the-art DSP development systems and instruments; our laboratory course would notbe possible without their support. It would also have been impossible without the active support of the de-partmental leadership and the sta� of the Electrical and Computer Engineering department, and particularlyDan Mast, for supporting, designing, equipping, and maintaining our instructional laboratory. We thank theConnexions team for their very substantial help in "connexifying" our materials, including conversion of themajority of the material into CNXML and MathML format; without their e�orts, the text in this form wouldnot exist. Support from the National Science Foundation in recent years enables continuing development ofthe course in response to student and industry needs. Most importantly, we are grateful to the generationsof teaching assistants and students who have taught and learned from these materials over the past decadeor more; it is their hard work, creative input, and dynamic interaction that have yielded this result.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 9: ECE 320 - Spring 2003 - cnx.org

ECE 320 Course Overview2

Introduction

The intent of this course is to familiarize students with the fundamentals of operating and analyzing real-time digital signal processing (DSP) systems including the theory required, the hardware used to sampleand process the signals, and the software environment used to control the system. The theory is primarilythose DSP concepts covered in ECE 310 including sampling, convolution, �ltering, �lter-design, modulation,and multirate processing (interpolation and decimation). The DSP hardware consists of an analog-to-digital(A/D) and digital-to-analog (D/A) converters and a TI TMS320C549 DSP to perform the processing.

References

Sta�

Lecturer

• Professor Yoram Bresler ([email protected] )

Teaching Assistants

• Mark Butala ([email protected] )• Michael Frutiger ([email protected] )• Rob Morrison ([email protected] )• Ted Zhang ([email protected] )

Meetings

• Lecture: 165 Everitt Lab; Mon 2:00-3:00• Lab: 251 Everitt Lab; Mon, Fri 3:00-5:00 and Tue, Wed, Thurs 2:00-4:00; phone 244-1360

Web page

http://www.ews.uiuc.edu/∼ece3208warning: The Connexions version of the course material (what you are viewing now) will alwaysbe up to date...the same may not be true for the course material found at the above URL.

2This content is available online at <http://cnx.org/content/m10660/2.26/>[email protected]@uiuc.edu5http://cnx.org/content/m10660/latest/matilto:[email protected]@[email protected]://www.ews.uiuc.edu/∼ece320

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

3

Page 10: ECE 320 - Spring 2003 - cnx.org

4

Texts

Texas Instruments TMS320C54x DSP Reference Set, Volume 1: CPU and Peripherals9 , Volume 2:Mnemonic Instruction Set10 , and Volume 4: Applications Guide11 are essential references. These doc-uments are available in PDF format on the course web page in the "handouts" section and are also availablein hardcopy in the lab. It is not necessary to purchase any texts for this course, and we ask that you donot print the manuals on the lab printer.

O�ce hours

O�ce hours will begin on Thursday, September 4. The TA's o�ce hours schedule is as follows:

Day of the Week TA Time

Monday Frutiger 6-7 PM

Tuesday Morrison 8-9 PM

Wednesday Butala 4-5 PM

Thursday Frutiger 6-7 PM

Friday Zhang 2-3 PM

Table 1

All o�ce hours, with the exception of Frutiger's Thursday o�ce hour, will be held in the ECE 320 lab(251 EL).

note: Frutiger's Thursday o�ce hour will address questions of a theoretical nature only and willbe held at the Green Street Co�ee House.

Schedule

The �rst half of the course consists of semi-self-paced labs in which you will learn the TI TMS320C549 DSP'sarchitecture, assembly language, and compilers. For the second half of the semester, you will conceive andcomplete a real-time DSP-related project. Note in the schedule below that a "lab week" starts on a Tuesdayand ends on the following Monday.

Dates Lecture Lab Requirements

Aug. 27 - Sep. 3 Introduction Lab 0: Lab Orientation

Sep. 4 - 10 TI Assembly Language Lab 1: FIR Filtering Prelab 1

continued on next page

9http://www-s.ti.com/sc/psheets/spru131g/spru131g.pdf10http://www-s.ti.com/sc/psheets/spru172c/spru172c.pdf11http://www-s.ti.com/sc/psheets/spru173/spru173.pdf

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 11: ECE 320 - Spring 2003 - cnx.org

5

Sep. 11 - 17 FIR / IIR Filters Lab 2: IIR Filtering Lab 1 quiz, Prelab 2

Sep. 18 - 24 Compilers Lab 3: Multirate Filter-ing

Lab 2 quiz, Prelab 3

Sep. 25 - Oct. 1 FFTs Lab 4: Spectrum Ana-lyzer

Lab 3 quiz, Prelab 4

Oct. 2 - 8 Digital Communica-tions

Lab 5: Digital Commu-nications

Lab 4 quiz

Oct. 9 - 15 Special Topics continue Lab 5 Prelab 5, Project ab-stract

Oct. 16 - 22 Special Topics Project Lab 1 Lab 5 quiz, Projectproposal

Oct. 23 - 29 Special Topics Project Lab 2 Project quiz 1

Oct. 30 - Nov. 5 Special Topics Project Topic Feedback Project quiz 2

Nov. 6 - 12 Special Topics Design Review Presen-tations

Design review slides

Nov. 13 - 19 Special Topics Project Pass design review

Nov. 20 - Dec. 3 Special Topics Project

Dec. 4 - 10 Special Topics Project

Dec. 11-12 Project demonstra-tions

Dec. 14 Project reports

Table 2

Grading

The structured laboratory segment will count for 50% of the total grade, based on completion of, and oralexamination over, the weekly exercises with each student quizzed individually. Labs are worth 10 points,usually with 1 point for prelab completion, 4 points for working code, and the remaining 5 points for quizperformance. We emphasize that grading in this class is based heavily on your demonstration of coursematerial, rather than exams or submitted assembly code.

The project will count for 50% of the total grade, with 20% of the total grade dependent on technicalwork on and oral demonstration of the project, and 10% of the grade dependent on the completeness andquality of the design review, 10% for the �nal report and 5% each on project labs.

It is expected that each student will attend and participate in scheduled class and laboratory meetingsand report on progress, or will make prior other arrangements with the instructor or teaching assistants.The �nal grade may be penalized if this does not occur.

Assignments

All graded assignments, including prelab exercises, DSP code, and �nal project materials, must be submittedto receive a grade in the course. All assignments other than the �nal report and presentation are due atthe start of your scheduled laboratory meeting. You must have your code complete before the class beginson the day the lab will be quizzed. We reserve the right to consider code late if it is not complete beforethe start of the lab session in which it is due. A late penalty of 50% will be assessed for assignments lessthan a week late, and no credit will be given for assignments more than one week late. In addition to these

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 12: ECE 320 - Spring 2003 - cnx.org

6

policies, the �nal project abstract and project proposal must be submitted and approved before project labsor project work are accepted for grading. Similarly, the design review must be passed before demonstration,report submission, or grading of the �nal project is allowed.

Quizzes

All lab quizzes must be taken during the lab on the day the lab is due. You must take the quiz duringyour assigned laboratory period even if your lab assignment is not complete. Any quiz not taken duringits assigned laboratory period will be assigned a zero grade unless other arrangements are made in advancewith your section's teaching assistants or Professor Bressler. Exceptions to this policy will be granted onlyfor excuses recognized by the College of Engineering.

note: One handwritten study sheet is allowed per quiz. Feel free to include any information youfeel would be useful.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 13: ECE 320 - Spring 2003 - cnx.org

7

Lab Use

We will meet in the lab for two hours during the week at the assigned times. Because completion of yourlab assignments will probably require additional lab time outside of the scheduled hours, you will be able toaccess the lab at any time during the semester using a keycard. Those students who currently have keycardsshould already have their cards activated for the DSP lab. If you are registered for the class and do not havea keycard, please request one in room 153 EL.

Basic rules of courtesy must be followed when in the lab. Please do not remove any lab equipment, booksor manuals from the lab at any time. Do not bring food or drink into the lab. If you would like to listen tomusic as you work, please use headphones.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 14: ECE 320 - Spring 2003 - cnx.org

8

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 15: ECE 320 - Spring 2003 - cnx.org

Announcements12

12This content is available online at <http://cnx.org/content/m10841/2.34/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

9

Page 16: ECE 320 - Spring 2003 - cnx.org

10

This Week in Lab

• Show progress on Project Lab 1.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 17: ECE 320 - Spring 2003 - cnx.org

11

Next Week in Lab

• Show progress on Project Lab 2.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 18: ECE 320 - Spring 2003 - cnx.org

12

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 19: ECE 320 - Spring 2003 - cnx.org

Chapter 1

Required Labs

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

13

Page 20: ECE 320 - Spring 2003 - cnx.org

14 CHAPTER 1. REQUIRED LABS

1.1 Lab 0

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 21: ECE 320 - Spring 2003 - cnx.org

15

1.1.1 Lab 0: Hardware Introduction1

1.1.1.1 Introduction

This exercise introduces the hardware and software used in testing a simple DSP system. When you completeit, you should be comfortable with the basics of testing a simple real-time DSP system with the debuggingenvironment you will use throughout the course. First, you will connect the laboratory equipment and testa real-time DSP system with pre-written code to implement an eight-tap (eight coe�cient) �nite impulseresponse (FIR) �lter. With a working system available, you will then begin to explore the debuggingsoftware used for downloading, modifying, and testing code. Finally, exercises are included to refresh yourfamiliarity with MATLAB.

1.1.1.2 Lab Equipment

This exercise assumes you have access to a laboratory station equipped with a Texas InstrumentsTMS320C549 digital signal processor chip mounted on a Spectrum Digital TMS320LC54x evaluation board.The DSP evaluation module should be connected to a PC running Windows and will be controlled using thePC application Code Composer Studio, a debugger and development environment. Mounted on top of eachDSP evaluation board is a Spectrum Digital surround-sound module employing a Crystal SemiconductorCS4226 codec. This board provides two analog input channels and six analog output channels at the CDsample rate of 44.1 kHz. The DSP board can also communicate with user code or a terminal emulatorrunning on the PC via a serial data interface.

In addition to the DSP board and PC, each laboratory station should also be equipped with a functiongenerator to provide test signals and an oscilloscope to display the processed waveforms.

1.1.1.2.1 Step 1: Connect cables

Use the provided BNC cables to connect the output of the function generator to input channel 1 on the DSPevaluation board. Connect output channels 1 and 2 of the board to channels 1 and 2 of the oscilloscope.The input and output connections for the DSP board are shown in Figure 1.1 (Example Hardware Setup).

1This content is available online at <http://cnx.org/content/m11019/2.7/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 22: ECE 320 - Spring 2003 - cnx.org

16 CHAPTER 1. REQUIRED LABS

Example Hardware Setup

DSP Evaluation Board

Ch1

Oscilloscope

Ch2Out

Function Generator

1 1

24 5 6

2 3

Output Input

Figure 1.1

Note that with this con�guration, you will have only one signal going into the DSP board and two signalscoming out. The output on channel 1 is the �ltered input signal, and the output on channel 2 is the un�lteredinput signal. This allows you to view the raw input and �ltered output simultaneously on the oscilloscope.Turn on the function generator and the oscilloscope.

1.1.1.2.2 Step 2: Log in

Use the network ID and password provided to log into the PC at your laboratory station.When you log in, two shared networked drives should be mapped to the computer: the W: drive, which

contains your own private network work directory, and the V: drive, where the necessary �les for ECE 420are stored. Be sure to save any �les that you use for the course to the W: drive. Temporary �les may bestored in the C:\TEMP directory; however, since �les stored on the C: drive are accessible to any user, arelocal to each computer, and may be erased at any time, do not store course �les on the C: drive. On the V:drive, the directories v:\ece420\54kx\dsplib\ and c:\ece420\54x\dsptools\ contain the �les necessaryto assemble and test code on the TI DSP evaluation boards.

Although you may want to work exclusively in one or the other of lab-partners' network account, youshould be sure that both partners have copies of the lab assignment assembly code.

warning: Not having the assembly code during a quiz because "it's on my partner's account" isNOT a valid excuse!

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 23: ECE 320 - Spring 2003 - cnx.org

17

For copying between partners' directory on W: or for working outside the lab, FTP access to your �les isavailable at ftp://elalpha.ece.uiuc.edu.

1.1.1.3 The Development Environment

The evaluation board is controlled by the PC through the JTAG interface (XDS510PP) using the applicationCode Composer Studio. This development environment allows the user to download, run, and debug codeassembled on the PC. Work through the steps below to familiarize yourself with the debugging environmentand real-time system using the provided FIR �lter code (Steps 3, 4 and 5), then verify the �lter's frequencyresponse with the subsequent MATLAB exercises (Steps 6 and 7).

1.1.1.3.1 Step 3: Assemble �lter code

Before you can execute and test the provided FIR �lter code, you must assemble the source �le. First,bring up a DOS prompt window and create a new directory to hold the �les, and then copy them into yourdirectory:

• w:

• mkdir lab0

• cd lab0

• copy v:\ece420\54x\dsplib\filter.asm .

• copy v:\ece420\54x\dsplib\coef.asm .

Next, assemble the �lter code by typing asm filter at the DOS prompt. The assembling process �rst includesthe FIR �lter coe�cients (stored in coef.asm) into the assembly �le filter.asm, then compiles the resultto produce an output �le containing the executable binary code, filter.out.

1.1.1.3.2 Step 4: Verify �lter execution

With your �lter code assembled, double-click on the Code Composer icon to open the debugging environ-ment. Before loading your code, you must reset the DSP board and initialize the processor mode statusregister (PMST). To reset the board, select the Reset option from the Debug menu in the Code Composerapplication.

Once the board is reset, select the CPU Registers option from the View menu, then select CPU Register.This will open a sub-window at the bottom of the Code Composer application window that displays severalof the DSP registers. Look for the PMST register; it must be set to the hexadecimal value FFE0 to have theDSP evaluation board work correctly. If it is not set correctly, change the value of the PMST register bydouble-clicking on the value and making the appropriate change in the Edit Register window that comesup.

Now, load your assembled �lter �le onto the DSP by selecting Load Program from the File menu.Finally, reset the DSP again, and execute the code by selecting Run from the Debug menu.

The program you are running accepts input from input channel 1 and sends output waveforms to outputchannels 1 and 2 (the �ltered signal and raw input, respectively). Note that the "raw input" on output channel2 may di�er from the actual input on input channel 1, because of distortions introduced in converting theanalog input to a digital signal and then back to an analog signal. The A/D and D/A converters on thesix-channel surround board operate at a sample rate of 44.1 kHz and have an anti-aliasing �lter and ananti-imaging �lter, respectively, that in the ideal case would eliminate frequency content above 22.05 kHz.The converters on the six-channel board are also AC coupled and cannot pass DC signals. On the basis ofthis information, what di�erences do you expect to see between the signals at input channel 1 and at outputchannel 2?

Set the amplitude on the function generator to 1.0 V peak-to-peak and the pulse shape to sinusoidal.Observe the frequency response of the �lter by sweeping the input signal through the relevant frequencyrange. What is the relevant frequency range for a DSP system with a sample rate of 44.1 kHz?

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 24: ECE 320 - Spring 2003 - cnx.org

18 CHAPTER 1. REQUIRED LABS

Based on the frequency response you observe, characterize the �lter in terms of its type (e.g., low-pass,high-pass, band-pass) and its -6 dB (half-amplitude) cuto� frequency (or frequencies). It may help to setthe trigger on channel 2 of the oscilloscope since the signal on channel 1 may go to zero.

1.1.1.3.3 Step 5: Re-assemble and re-run with new �lter

Once you have determined the type of �lter the DSP is implementing, you are ready to repeat the processwith a di�erent �lter by including di�erent coe�cients during the assembly process. Copy a second set ofFIR coe�cients over to your working directory with the following:

• copy coef.asm coef1.asm

• copy v:\ece420\54x\dsplib\coef2.asm coef.asm

You can now repeat the assembly and testing process with the new �lter using the asm instruction at theDOS prompt and repeating the steps required to execute the code discussed in Step 4 (Section 1.1.1.3.2: Step4: Verify �lter execution).

Just as you did in Step 4 (Section 1.1.1.3.2: Step 4: Verify �lter execution), determine the type of �lteryou are running and the �lter's -6 dB point by testing the system at various frequencies.

1.1.1.3.4 Step 6: Check �lter response in MATLAB

In this step, you will use MATLAB to verify the frequency response of your �lter by copying the coe�cientsfrom the DSP to MATLAB and displaying the magnitude of the frequency response using the MATLABcommand freqz.

The FIR �lter coe�cients included in the �le coef.asm are stored in memory on the DSP starting atlocation (in hex) 0x1000, and each �lter you have assembled and run has eight coe�cients. To view the �ltercoe�cients as signed integers, select the Memory option from the View menu to bring up a Memory Window

Options box. In the appropriate �elds, set the starting address to 0x1000 and the format to 16-Bit Signed

Int. Click "OK" to open a memory window displaying the contents of the speci�ed memory locations. Thenumbers along the left-hand side indicate the memory locations.

In this example, the �lter coe�cients are placed in memory in decreasing order; that is, the last coe�cient,h [7], is at location 0x1000 and the �rst coe�cient, h [0], is stored at 0x1007.

Now that you can �nd the coe�cients in memory, you are ready to use the MATLAB command freqz

to view the �lter's response. You must create a vector in MATLAB with the �lter coe�cients to use thefreqz command. For example, if you want to view the response of the three-tap �lter with coe�cients -10,20, -10 you can use the following commands in MATLAB:

• h = [-10, 20, -10];

• plot(abs(freqz(h)))

Note that you will have to enter eight values, the contents of memory locations 0x1000 through 0x1007,into the coe�cient vector, h.

Does the MATLAB response compare with your experimental results? What might account for anydi�erences?

1.1.1.3.5 Step 7: Create new �lter in MATLAB and verify

MATLAB scripts will be made available to you to aid in code development. For example, one of thesescripts allows you to save �lter coe�cients created in MATLAB in a form that can be included as part ofthe assembly process without having to type them in by hand (a very useful tool for long �lters). Thesescripts may already be installed on your computer; otherwise, download the �les from the links as they areintroduced.

First, have MATLAB generate a "random" eight-tap �lter by typing h = gen_filt; at a MATLABprompt. Then save this vector of �lter coe�cients by typing save_coef('coef.asm',flipud(h)); Make

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 25: ECE 320 - Spring 2003 - cnx.org

19

sure you save the �le in your own directory. (The scripts that perform these functions are available asgen_�lt.m2 and save_coef.m3)

The save_coef MATLAB script will save the coe�cients of the vector h into the named �le, which inthis case is coef.asm. Note that the coe�cient vector is "�ipped" prior to being saved; this is to make thecoe�cients in h �ll DSP memory-locations 0x1000 through 0x1007 in reverse order, as before.

You may now re-assemble and re-run your new �lter code as you did in Step 5 (Section 1.1.1.3.3: Step5: Re-assemble and re-run with new �lter).

Notice when you load your new �lter that the contents of memory locations 0x1000 through 0x1007

update accordingly.

1.1.1.3.6 Step 8: Modify �lter coe�cients in memory

Not only can you view the contents of memory on the DSP using the debugger, you can change the contentsat any memory location simply by double-clicking on the location and making the desired change in thepop-up window.

Change the contents of memory locations 0x1000 through 0x1007 such that the coe�cients implement ascale and delay �lter with impulse response:

h [n] = 8192δ [n− 4] (1.1)

Note that the DSP interprets the integer value of 8192 as a fractional number by dividing the integer by32,768 (the largest integer possible in a 16-bit two's complement register). The result is an output thatis delayed by four samples and scaled by a factor of 1

4 . More information on the DSP's interpretation ofnumbers appears in Two's Complement and Fractional Arithmetic for 16-bit Processors (Section 3.1.1).

note: A clear and complete understanding of how the DSP interprets numbers is absolutelynecessary to e�ectively write programs for the DSP. Save yourself time later by learning this materialnow!

After you have made the changes to all eight coe�cients, run your new �lter and use the oscilloscope tomeasure the delay between the raw (input) and �ltered (delayed) waveforms.

What happens to the output if you change either the scaling factor or the delay value? How many secondslong is a six-sample delay?

1.1.1.3.7 Step 9: Test-vector simulation

As a �nal exercise, you will �nd the output of the DSP for an input speci�ed by a test vector. Then youwill compare that output with the output of a MATLAB simulation of the same �lter processing the sameinput; if the DSP implementation is correct, the two outputs should be almost identical. To do this, you willgenerate a waveform in MATLAB and save it as a test vector. You will then run your DSP �lter using thetest vector as input and import the results back into MATLAB for comparison with a MATLAB simulationof the �lter.

The �rst step in using test vectors is to generate an appropriate input signal. One way to do this is touse the MATLAB function to generate a sinusoid that sweeps across a range of frequencies. The MATLABfunction save_test_vector (available as save_test_vector.m4 can then save the sinusoidal sweep to a �leyou will later include in the DSP code.

Generate a sinusoidal sweep and save it to a DSP test-vector �le using the following MATLAB commands:

� t=sweep(0.1*pi,0.9*pi,0.25,500); % Generate a frequency sweep

2See the �le at <http://cnx.org/content/m11019/latest/gen_�lt.m>3See the �le at <http://cnx.org/content/m11019/latest/save_coef.m>4See the �le at <http://cnx.org/content/m11019/latest/save_test_vector.m>

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 26: ECE 320 - Spring 2003 - cnx.org

20 CHAPTER 1. REQUIRED LABS

� save_test_vector('testvect.asm',t); % Save the test vector

Next, use the MATLAB conv command to generate a simulated response by �ltering the sweep with the�lter h you generated using gen_filt above. Note that this operation will yield a vector of length 507(which is n + m− 1, where n is the length of the �lter and m is the length of the input). You should keeponly the �rst 500 elements of the resulting vector.

� out=conv(h,t) % Filter t with FIR filter h

� out=out(1:500) % Keep first 500 elements of out

Now, modify the �le filter.asm to use the alternative "test vector" core �le, vectcore.asm5. Rather thanaccepting input from the A/D converters and sending output to the D/A, this core �le takes its input from,and saves its output to, memory on the DSP. The test vector is stored in a block of memory on the DSPevaluation board that will not interfere with your program code or data.

Note: The test vector is stored in the ".etext" section. See Core File: Introduction to Six-ChannelBoard for TI EVM320C54 (Section 3.2.1) for more information on the DSP memory sections,including a memory map.

The memory block that holds the test vector is large enough to hold a vector up to 4,000 elements long. Thetest vector stores data for both channels of input and from all six channels of output.

To run your program with test vectors, you will need to modify filter.asm. The assembly source issimply a text �le and can be edited using the editor of your preference, including WordPad, Emacs, and VI.Replace the �rst line of the �le with two lines. Instead of:

.copy "v:\ece420\54x\dsplib\core.asm"

use:

.copy "testvect.asm"

.copy "v:\ece420\54x\dsplib\vectcore.asm"

Note that, as usual, the whitespace in front of the .copy directive is required.These changes will copy in the test vector you created and use the alternative core �le. After modifying

your code, assemble it, then load and run the �le using Code Composer as before. After a few seconds,halt the DSP (using the Halt command under the Debug menu) and verify that the DSP has halted at abranch statement that branches to itself. In the disassembly window, the following line should be highlighted:0000:611F F073 B 611fh.

Next, save the test output �le and load it back into MATLAB. This can be done by �rst saving 3,000memory elements (six channels times 500 samples) starting with location 0x8000 in program memory. Dothis by choosing File->Data->Save... in Code Composer Studio, then entering the �lename output.datand pressing Enter. Next, enter 0x8000 in the Address �eld of the dialog box that pops up, 3000 in theLength �eld, and choose Program from the drop-down menu next to Page. Always make sure that you usethe correct length (six times the length of the test vector) when you save your results.

5See the �le at <http://cnx.org/content/m11019/latest/vectcore.asm>

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 27: ECE 320 - Spring 2003 - cnx.org

21

Last, use the read_vector (available as read_vector.m6) function to read the saved result into MATLAB.Do this using the following MATLAB command:

� [ch1, ch2] = read_vector('output.dat');

Now, the MATLAB vector ch1 corresponds to the �ltered version of the test signal you generated. TheMATLAB vector ch2 should be nearly identical to the test vector you generated, as it was passed from theDSP system's input to its output unchanged.

Note: Because of quantization error introduced in saving the test vector for the 16-bit memory ofthe DSP, the vector ch2 will not be identical to the MATLAB generated test vector. Furthermore,a bug in our test vector environment sometimes causes blocks of samples to be dropped, so the testvector output signal may have gaps.

After loading the output of the �lter into MATLAB, compare the expected output (calculated as out

above) and the output of the �lter (in ch1 from above). This can be done graphically by simply plotting thetwo curves on the same axes; for example:

� plot(out,'r'); % Plot the expected curve in red

� hold on % Plot the next plot on top of this one

� plot(ch1,'g'); % Plot the expected curve in green

� hold off

You should also ensure that the di�erence between the two outputs is near zero. This can be done by plottingthe di�erence between the two vectors:

� plot(out(1:length(ch1))-ch1); % Plot error signal

You will observe that the two sequences are not exactly the same; this is due to the fact that the DSP com-putes its response to 16 bits precision, while MATLAB uses 64-bit �oating point numbers for its arithmetic.Blocks of output samples may also be missing from the test vector output due to a bug in the test vectorcore. Nonetheless, the test vector environment allows one to run repeatable experiments using the sameknown test input for debugging.

6See the �le at <http://cnx.org/content/m11019/latest/read_vector.m>

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 28: ECE 320 - Spring 2003 - cnx.org

22 CHAPTER 1. REQUIRED LABS

1.2 Lab 1

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 29: ECE 320 - Spring 2003 - cnx.org

23

1.2.1 Lab 1: Prelab7

1.2.1.1 Assembly Exercise

Analyze the following lines of code. Refer to Two's Complement and Fractional Arithmetic for 16-bit Proces-sors (Section 3.1.1), Addressing Modes for TI TMS320C54x (Section 3.1.2), and the Mnemonic InstructionSet[?] manual for help.

1 FIR_len .set 3

2

3 ; Assume:

4 ; BK = FIR_len

5 ; AR0 = 1

6 ; AR2 = 1000h

7 ; AR3 = 1004h

8 ;

9 ; FRCT = 1

10

11 stl A,*AR3+%

12 rptz A,(FIR_len-1)

13 mac *AR2+0%,*AR3+0%,A

Anything following a ";" is considered a comment. In this case, the comments indicate the contents of theauxiliary registers, the BK register, and the address registers before the execution of the �rst instruction,stl. The line FIR_len .set 3 de�nes the name FIR_len as equal to 3. The BK register contains the lengthof the circular bu�er we want to use. The % modi�es the increment operator + so that it behaves as acircular bu�er. This means that the address registers will be incremented until the (memory-address modvalue-in-BK) = 0. When the increment operator + is followed by a 0, it increments by the value speci�ed inregister AR0.

Note that any number followed by an "h" or preceded with a 0x represents a hexadecimal value.

Example 1.11000h and 0x1000 both refer to the decimal number 4096.

Assume that the data memory is initialized as follows starting at location 1000h.

7This content is available online at <http://cnx.org/content/m10022/2.22/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 30: ECE 320 - Spring 2003 - cnx.org

24 CHAPTER 1. REQUIRED LABS

Figure 1.2: Data Memory Assignment (before execution)

After familiarizing yourself with the stl, rptz, and mac instructions, step through each line of codeand record the values of the accumulator A and auxiliary registers AR2 and AR3 in the spaces provided inFigure 1.3. Additionally, record the value of the memory contents after all three instructions have been"executed" in the blank data memory table provided in Figure 1.4.

A AR2 AR3

00 0000 8000h 1000h 1004h at start of code

after stl instruction

after rptz instruction

after �rst mac instruc-tion

after second mac in-struction

after third mac instruc-tion

Figure 1.3: Execution Results

When working through the exercise, take into account that the accumulator A is a 40-bit register, and thatthe multiplier is in the fractional arithmetic mode. In this mode, integers on the DSP are interpreted as

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 31: ECE 320 - Spring 2003 - cnx.org

25

fractions, and the multiplier will treat them accordingly. This is done by shifting the result of the integermultiplier in the ALU left one bit. (All the arithmetic is fractional in these examples.) Multiplies performedby the ALU (via the mac instruction) produce a result that is twice what you would expect if you justmultiplied the two integers together. DSP numerical representation and arithmetic are described further inTwo's Complement and Fractional Arithmetic for 16-bit Processors (Section 3.1.1).

Figure 1.4: Data Memory Assignment (after execution)

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 32: ECE 320 - Spring 2003 - cnx.org

26 CHAPTER 1. REQUIRED LABS

1.2.2 Lab 1: Lab8

1.2.2.1 Introduction

In this exercise, you will program in the DSP's assembly language to create FIR �lters. Begin by studyingthe assembly code for the basic FIR �lter �lter.asm9 .

8This content is available online at <http://cnx.org/content/m11020/2.6/>.9http://cnx.rice.edu/content/m10017/latest/�lter.asm

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 33: ECE 320 - Spring 2003 - cnx.org

27

�lter.asm

1 .copy "core.asm" ; Copy in core file

2 ; This initializes DSP and jumps to "main"

3

4 FIR_len .set 8 ; This is an 8-tap filter.

5

6 .sect ".data" ; Flag following as data declarations

7

8 .align 16 ; Align to a multiple of 16

9 coef ; assign label "coeff"

10 .copy "coef.asm" ; Copy in coefficients

11

12 .align 16

13 firstate

14 .space 16*8 ; Allocate 8 words of storage for

15 ; filter state.

16

17 .sect ".text" ; Flag the following as program code

18 main

19 ; Initialize various pointers

20 stm #FIR_len,BK ; initialize circular buffer length

21 stm #coef,AR2 ; initialize coefficient pointer

22 stm #firstate,AR3 ; initialize state pointer

23 stm #1,AR0 ; initialize AR0 for pointer increment

24

25 loop

26 ; Wait for a new block of 64 samples to come in

27 WAITDATA

28

29 ; BlockLen = the number of samples that come from WAITDATA (64)

30 stm #BlockLen-1, BRC ; Put repeat count into repeat counter

31 rptb endblock-1 ; Repeat between here and 'endblock'

32

33 ld *AR6,16, A ; Receive ch1 into A accumulator

34 mar *+AR6(2) ; Rcv data is in every other channel

35 ld *AR6,16, B ; Receive ch2 into B accumulator

36 mar *+AR6(2) ; Rcv data is in every other channel

37

38 ld A,B ; Transfer A into B for safekeeping

39

40 ; The following code executes a single FIR filter.

41

42 sth A,*AR3+% ; store current input into state buffer

43 rptz A,(FIR_len-1) ; clear A and repeat

44 mac *AR2+0%,*AR3+0%,A ; multiply coef. by state & accumulate

45

46 rnd A ; Round off value in 'A' to 16 bits

47

48 ; end of FIR filter code. Output is in the high part of 'A.'

49

50 sth A, *AR7+ ; Store filter output (from A) into ch1

51 sth B, *AR7+ ; Store saved input (from B) into ch2

52

53 sth B, *AR7+ ; Store saved input to ch3...ch6 also

54 sth B, *AR7+ ; ch4

55 sth B, *AR7+ ; ch5

56 sth B, *AR7+ ; ch6

57

58 endblock:

59 b loop

Figure 1.5

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 34: ECE 320 - Spring 2003 - cnx.org

28 CHAPTER 1. REQUIRED LABS

filter.asm applies an FIR �lter to the signal from input channel 1 and sends the resulting output tooutput channel 1. It also sends the original signal to output channel 2.

First, create a work directory on your network drive for the �les in this exercise, and copy filter.asm

from v:\ece320\54x\dsplib to your work directory (this is thesame �le you worked with in Lab 0). Then,use MATLAB to generate two 20-tap FIR �lters. The �rst �lter should pass signals from 4 kHz to 8 kHz; thesecond �lter should pass from 8 kHz to 12 kHz. For both �lters, allow a 1 kHz transition band on each edgeof the �lter passband. To create these �lters, �rst convert these band edges to digital frequencies based onthe 44.1 kHz sample rate of the system, then use the MATLAB command remez to generate this �lter; youcan type help remez for more information. Use the save_coef command to save each of these �lters intodi�erent �les. (Make sure you reverse the vectors of �lter coe�cients before you save them.) Also save your�lters as a MATLAB matrix, since you will need them later to generate test vectors. This can be done usingthe MATLAB save command. Once this is done, use the freqz command to plot the frequency response ofeach �lter.

1.2.2.2 Part 1: Single-Channel FIR Filter

For now, you will implement only the �lter with a 4 kHz to 8 kHz passband. Edit filter.asm to use thecoe�cients for this �lter by making several changes.

First, the length of the FIR �lter for this exercise is 20, not 8. Therefore, you need to change FIR_len

to 20. FIR_len is set using the .set directive, which assigns a number to a symbolic name. You will needto change this to FIR_len .set 20.

Second, you will need to ensure that the .copy directive brings in the correct coe�cients. Change the�lename to point to the �le that contains the coe�cients for your �rst �lter.

Third, you will need to modify the .align and .space directives appropriately. The TI TMS320C54xDSP requires that circular bu�ers, which are used for the FIR �lter coe�cient and state bu�ers, be alignedso that they begin at an address that is a multiple of a power of two greater than the length of the bu�er.Since you are using a 20-tap �lter (which uses 20-element state and coe�cient bu�ers), the next greaterpower of two is 32. Therefore, you will need to align both the state and coe�cient bu�ers to an address thatis a multiple of 32. (16-element bu�ers would also require alignment to a multiple of 32.) This is done withthe .align command. In addition, memory must be reserved for the state bu�er. This is done using the.space directive, which takes as its input the number of bits of space to allocate. Therefore, to allocate 20words of storage, use the directive .space 16*20 as shown below:

1 .align 32 % Align to a multiple of 32

2 coef .copy "filter1.asm" % Copy FIR filter coefficients

3

4 .align 32 % Align to a multiple of 32

5 state .space 16*20 % Allocate 20 words of data space

Assemble your code, set PMST to 0xFFE0, reset the DSP, and run. Ensure that it is has the correct frequencyresponse. After you have veri�ed that this code works properly, proceed to the next step.

1.2.2.3 Part 2: Dual-Channel FIR Filters

First, make a copy of your modi�ed filter.asm �le from Part 1 (Section 1.2.2.2: Part 1: Single-ChannelFIR Filter). Work from this copy; do not modify your working �lter from the previous part. You will usethat code again later.

Next, modify your code so that in addition to sending the output of your �rst �lter (with a 4 kHz to 8kHz passband) to output channel 1 and the un�ltered input to output channel 2, it sends the output of yoursecond �lter (with a 8 kHz to 12 kHz passband) to output channel 3. To do this, you will need to use the

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 35: ECE 320 - Spring 2003 - cnx.org

29

.align and .copy directives to load the second set of coe�cients into data memory. You will also need toadd instructions to initialize a pointer to the second set of coe�cients and to perform the calculations forthe second �lter.

Exercise 1.2.2.1Extra Credit ProblemOne extra credit point will be awarded to you and your partner if you can implement the dual-channel system without using the auxiliary registers AR4 and AR5? Why is this more di�cult?Renaming AR4 and AR5 using the .asg directive does not count!

Using the techniques introduced in DSP Development Environment: Introductory Exercise for TITMS320C54x10, generate an appropriate test vector and expected outputs in MATLAB. Then, usingthe test-vector core �le also introduced in DSP Development Environment: Introductory Exercise for TITMS320C54x11, �nd the system's output given this test vector. In MATLAB, plot the expected and actualoutputs of the both �lters and the di�erence between the expected and actual outputs. Why is the outputfrom the DSP system not exactly the same as the output from MATLAB?

1.2.2.4 Part 3: Alternative Single-Channel FIR Implementation

An alternative method of implementing symmetric FIR �lters uses the firs instruction. Modify your codefrom Part 1 (Section 1.2.2.2: Part 1: Single-Channel FIR Filter) to implement the �lter with a 4 kHz to 8kHz passband using the firs.

Two di�erences in implementation between your code from Part 1 (Section 1.2.2.2: Part 1: Single-ChannelFIR Filter) and the code you will write for this part are that (1) the firs instruction expects coe�cientsto be located in program memory instead of data memory, and (2) firs requires the states to be broken upinto two separate circular bu�ers. Refer to the firs instruction on page 4-59 in the Mnemonic InstructionSet[?] manual, as well as a description and example of its use on pages 4-5 through 4-8 of the ApplicationsGuide[?] for more information (Volumes 2 and 4 respectively of the TMS320C54x DSP Reference Set).

AR0 needs to be set to -1 for this code to work properly. Why?

note: COEFF is a label to the coe�cients now expected to be in program memory. Refer to thefirs description for more information).

1 mvdd *AR2,*AR3+0% ; write x(-N/2) over x(-N)

2 sth A,*AR2 ; write x(0) over x(-N/2)

3 add *AR2+0%,*AR3+0%,A ; add x(0) and x(-(N-1))

4 ; (prepare for first multiply)

5

6 rptz B,#(FIR_len/2-1)

7 firs *AR2+0%,*AR3+0%,COEFF

8 mar ??????? ; Fill in these two instructions

9 mar ??????? ; They modify AR2 and AR3.

10

11 ; note that the result is now in the

12 ; B accumulator

Because states and coe�cients are now treated di�erently than in your previous FIR implementation, youwill need to modify the pointer initializations to

10"DSP Development Environment: Introductory Exercise for TI TMS320C54x" <http://cnx.org/content/m10017/latest/>11"DSP Development Environment: Introductory Exercise for TI TMS320C54x" <http://cnx.org/content/m10017/latest/>

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 36: ECE 320 - Spring 2003 - cnx.org

30 CHAPTER 1. REQUIRED LABS

1 stm #(FIR_len/2),BK ; initialize circular buffer length

2 stm #firstate_,AR2 ; initialize location containing first

3 ; half of states

4

5 stm #-1,AR0 ; Initialize AR0 to -1

6

7 stm #firstate2_,AR3 ; initialize location containing last half

Use the test-vector core �le to �nd the output of this system given the same test vector you used to test thetwo-�lter system. Compare the output of this code against the output of the same �lter implemented usingthe mac instruction. Are the results the same? Why or why not? Ensure that the �ltered output is sent tooutput channel 1, and that the unmodi�ed output is still sent to output channel 2.

warning: You will lose credit if the unmodi�ed output is not present or if the channels are reversed!

1.2.2.5 Quiz Information

The quiz for Lab 1 is broken down as follows:

• 1 point: Prelab (must be ready to show the TA the week before the quiz)• 4 points: Working code: you must demonstrate that your code works using input from function

generator and that it works using input from appropriate test vectors. Have an .asm �le ready todemonstrate each. Of the 4 points, you get 0.5 points for a single 20-tap �lter, 2 points for the two-�ltersystem, and 1.5 points for the system using the firs opcode.

• 5 points: Oral quiz score.• 1 extra credit point: As described above (p. 28).

The oral quiz may cover signal processing material relating to FIR �lters, including, but not limited to,the delay through FIR �lters, generalized linear phase, and the di�erences between ideal FIR �lters andrealizable FIR �lters. You may also be asked questions about digital sampling theory, including, but notlimited to, the Nyquist sampling theorem and the relationship between the analog frequency spectrum andthe digital frequency spectrum of a continuous-time signal that has been sampled.

The oral quiz will cover the code that you have written during the lab. You are expected to understand,in detail, all of the code in the �les you have worked on, even if your partner or a TA wrote it. (You are notexpected to understand the core �le in detail). The TA will ask you to explain various lines of code as partof the quiz. The TAs may also ask questions about 2's complement fractional arithmetic, circular bu�ers,alignment, and the mechanics of either of the two FIR �lter implementations. You could be ready to tracethrough any of the code on paper and explain what each line of code does.

Use the TI documentation, speci�cally the Mnemonic Instruction Set[?] manual. Hard-copies of thismanual can also be found in the lab. Also, feel free to ask the TAs to help explain the code that you havebeen given.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 37: ECE 320 - Spring 2003 - cnx.org

31

1.3 Lab 2

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 38: ECE 320 - Spring 2003 - cnx.org

32 CHAPTER 1. REQUIRED LABS

1.3.1 Lab 2: Theory12

1.3.1.1 Introduction

Like �nite impulse-response (FIR) �lters, in�nite impulse-response (IIR) �lters are linear time-invariant (LTI) systems that can recreate a large range of di�erent frequency responses. Compared toFIR �lters, IIR �lters have both advantages and disadvantages. On one hand, implementing an IIR �lterwith certain stopband-attenuation and transition-band requirements typically requires far fewer �lter tapsthan an FIR �lter meeting the same speci�cations. This leads to a signi�cant reduction in the computationalcomplexity required to achieve a given frequency response. However, the poles in the transfer function requirefeedback to implement an IIR system. In addition to inducing nonlinear phase in the �lter (delaying di�erentfrequency input signals by di�erent amounts), the feedback introduces complications in implementing IIR�lters on a �xed-point processor. Some of these complications are explored in IIR Filtering: Filter-Coe�cientQuanitization Exercise in MATLAB (Section 1.3.3).

Later, in the processor exercise, you will explore the advantages and disadvantages of IIR �lters byimplementing and examining a fourth-order IIR system on a �xed-point DSP. The IIR �lter should beimplemented as a cascade of two second-order, Direct Form II sections. The data �ow for a second-order,Direct-Form II section, or bi-quad, is shown in Figure 1.6. Note that in Direct Form II, the states (delayedsamples) are neither the input nor the output samples, but are instead the intermediate values w [n].

z−1

z−1

b2

b1

G

−a1

−a2

x[n] w[n] y[n]+

+

+

+

Figure 1.6: Second-order, Direct Form II section

12This content is available online at <http://cnx.org/content/m10025/2.22/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 39: ECE 320 - Spring 2003 - cnx.org

33

1.3.2 Lab 2: Prelab (part 1)13

1.3.2.1

The transfer function for the second-order section shown in IIR Filtering: Introduction (Figure 1.6) is

H (z) = G1 + b1z

−1 + b2z−2

1 + a1z−1 + a2z−2(1.2)

1.3.2.1.1 Exercise

First, derive the above transfer function. Begin by writing the di�erence equations for w [n] in termsof the input and past values (w [n− 1] and w [n− 2]). Then write the di�erence equation for y [n] also interms of the past samples of w [n]. After �nding the two di�erence equations, compute the corresponding

Z-transforms and use the relation H (z) = Y (z)X(z) = Y (z)W (z)

W (z)X(z) to verify the IIR transfer function in (1.2).

Next, design the coe�cients for a fourth-order �lter implemented as the cascade of two bi-quad sections.Write a MATLAB script to compute the coe�cients. Begin by designing the fourth-order �lter and checkingthe response using the MATLAB commands

[B,A] = ellip(4,.25,10,.25)

freqz(B,A)

note: MATLAB's freqz command displays the frequency responses of IIR �lters and FIR �lters.For more information about this, type help freqz. Be sure to look at MATLAB's de�nition of thetransfer function.

note: If you use the freqz command as shown above, without passing its returned data to anotherfunction, both the magnitude (in decibels) and the phase of the response will be shown.

Next you must �nd the roots of the numerator, zeros, and roots of the denominator, poles, so thatyou can group them to create two second-order sections. The MATLAB commands roots and poly will beuseful for this task. Save the scripts you use to decompose your �lter into second-order sections; they willprobably be useful later.

Once you have obtained the coe�cients for each of your two second-order sections, you are ready tochoose a gain factor, G, for each section. As part of your MATLAB script, use freqz to compute the

response W (z)X(z) with G = 1 for each of the sets of second-order coe�cients. Recall that on the DSP we cannot

represent numbers greater than or equal to 1.0. If the maximum value of |W (z)X(z) | is or exceeds 1.0, an input

with magnitude less than one could produce w [n] terms with magnitude greater than or equal to one; thisis over�ow. You must therefore select a gain values for each second-order section such that the response

from the input to the states, W (z)X(z) , is always less than one in magnitude. In other words, set the value of G

to ensure that |W (z)X(z) | < 1.

1.3.2.1.2 Preparing for processor implementation

As the processor exercises become more complex, it will become increasingly important to observe goodprogramming practices. Of these, perhaps the most important is careful planning of your program �ow,

13This content is available online at <http://cnx.org/content/m10623/2.11/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 40: ECE 320 - Spring 2003 - cnx.org

34 CHAPTER 1. REQUIRED LABS

memory and register use, and testing procedure. Write out pseudo-code for the processor implementationof a bi-quad. Make sure you consider the way you will store coe�cients and states in memory. Then, toprepare for testing, compute the values of w [n] and y [n] for both second-order sections at n = {0, 1, 2} usingthe �lter coe�cients you calculated in MATLAB. Assume x [n] = δ [n] and all states are initialized to zero.You may also want to create a frequency sweep test-vector like the one in DSP Development Environment:Introductory Exercise for TI TMS320C54x14 and use the �lter command to �nd the outputs for that input.Later, you can recreate these input signals on the DSP and compare the output values it calculates with thoseyou �nd now. If your program is working, the values will be almost identical, di�ering only slightly becauseof quantization e�ects, which are considered in IIR Filtering: Filter-Coe�cient Quantization Exercise inMATLAB (Section 1.3.3).

14"DSP Development Environment: Introductory Exercise for TI TMS320C54x" <http://cnx.org/content/m10017/latest/>

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 41: ECE 320 - Spring 2003 - cnx.org

35

1.3.3 Lab 2: Prelab (part 2)15

1.3.3.1 Filter-Coe�cient Quantization

One important issue that must be considered when IIR �lters are implemented on a �xed-point processoris that the �lter coe�cients that are actually used are quantized from the "exact" (high-precision �oatingpoint) values computed by MATLAB. Although quantization was not a concern when we worked with FIR�lters, it can cause signi�cant deviations from the expected response of an IIR �lter.

By default, MATLAB uses 64-bit �oating point numbers in all of its computation. These �oating pointnumbers can typically represent 15-16 digits of precision, far more than the DSP can represent internally. Forthis reason, when creating �lters in MATLAB, we can generally regard the precision as "in�nite," becauseit is high enough for any reasonable task.

note: Not all IIR �lters are necessarily "reasonable"!

The DSP, on the other hand, operates using 16-bit �xed-point numbers in the range of -1.0 to 1.0 − 2−15.This gives the DSP only 4-5 digits of precision and only if the input is properly scaled to occupy the fullrange from -1 to 1.

For this section exercise, you will examine how this di�erence in precision a�ects a notch �lter generatedusing the butter command: [B,A] = butter(2,[0.07 0.10],'stop').

1.3.3.1.1 Quantizing coe�cients in MATLAB

It is not di�cult to use MATLAB to quantize the �lter coe�cients to the 16-bit precision used on the DSP.

To do this, �rst take each vector of �lter coe�cients (that is, the⇀

A and⇀

B vectors) and divide by the smallestpower of two such that the resulting absolute value of the largest �lter coe�cient is less than or equal toone. This is an easy but fairly reasonable approximation of how numbers outside the range of -1 to 1 areactually handled on the DSP.

Next, quantize the resulting vectors to 16 bits of precision by �rst multiplying them by 215 = 32768,rounding to the nearest integer (use round), and then dividing the resulting vectors by 32768. Then multiplythe resulting numbers, which will be in the range of -1 to 1, back by the power of two that you divided out.

1.3.3.1.2 E�ects of quantization

Explore the e�ects of quantization by quantizing the �lter coe�cients for the notch �lter. Use the freqz

command to compare the response of the unquantized �lter with two quantized versions: �rst, quantize theentire fourth-order �lter at once, and second, quantize the second-order ("bi-quad") sections separately andrecombine the resulting quantized sections using the conv function. Compare the response of the unquantized�lter and the two quantized versions. Which one is "better?" Why do we always implement IIR �lters usingsecond-order sections instead of implementing fourth (or higher) order �lters directly?

Be sure to create graphs showing the di�erence between the �lter responses of the unquantized notch�lter, the notch �lter quantized as a single fourth-order section, and the notch �lter quantized as two second-order sections. Save the MATLAB code you use to generate these graphs, and be prepared to reproduceand explain the graphs as part of your quiz. Make sure that in your comparisons, you rescale the resulting�lters to ensure that the response is unity (one) at frequencies far outside the notch.

15This content is available online at <http://cnx.org/content/m10813/2.5/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 42: ECE 320 - Spring 2003 - cnx.org

36 CHAPTER 1. REQUIRED LABS

1.3.4 Lab 2: Lab16

1.3.4.1 Implementation

On the DSP, you will implement the elliptic low-pass �lter designed using the ellip command from IIRFilters: Filter-Design Exercise in MATLAB (Section 1.3.2). You should not try to implement the notch �lterdesigned in IIR Filtering: Filter-Coe�cient Quantization Exercise in MATLAB (Section 1.3.3), because itwill not work correctly when implemented using Direct Form II. (Why not?)

To implement the fourth-order �lter, start with a single set of second-order coe�cients and implement asingle second-order section. Make sure you write and review pseudo-code before you begin programming.Once your single second-order IIR is working properly you can then proceed to code the entire fourth-order�lter.

1.3.4.1.1 Large coe�cients

You may have noticed that some of the coe�cients you have computed for the second-order sections arelarger than 1.0 in magnitude. For any stable second-order IIR section, the magnitude of the "0" and "2"coe�cients (a0 and a2, for example) will always be less than or equal to 1.0. However, the magnitude ofthe "1" coe�cient can be as large as 2.0. To overcome this problem, you will have to divide the a1 and b1coe�cients by two prior to saving them for your DSP code. Then, in your implementation, you will have tocompensate somehow for using half the coe�cient value.

1.3.4.1.2 Repeating code

Rather than write separate code for each second-order section, you are encouraged �rst to write one section,then write code that cycles through the second-order section code twice using the repeat structure below.Because the IIR code will have to run inside the block I/O loop and this loop uses the block repeat counter(BRC), you must use another looping structure to avoid corrupting the BRC.

note: You will have to make sure that your code uses di�erent coe�cients and states during thesecond cycle of the repeat loop.

stm (num_stages-1),AR1

start_stage

; IIR code goes here

banz start_stage,*AR1-

1.3.4.1.3 Gain

It may be necessary to add gain to the output of the system. To do this, simply shift the output left (whichcan be done using the ld opcode with its optional shift parameter) before saving the output to memory.

1.3.4.2 Grading

Your grade on this lab will be split into three parts:

• 1 point: Prelab

16This content is available online at <http://cnx.org/content/m11021/2.4/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 43: ECE 320 - Spring 2003 - cnx.org

37

• 4 points: Code. Your DSP code implementing the fourth-order IIR �lter is worth 3 points and theMATLAB exercise is worth 1 point.

• 5 points: Oral quiz. The quiz may cover di�erences between FIR and IIR �lters, the prelab material,and the MATLAB exercise.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 44: ECE 320 - Spring 2003 - cnx.org

38 CHAPTER 1. REQUIRED LABS

1.4 Lab 3

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 45: ECE 320 - Spring 2003 - cnx.org

39

1.4.1 Lab 3: Theory17

1.4.1.1 Introduction

In the exercises that follow, you will explore some of the e�ects of multirate processing using the systemin Figure 1.7. The sample-rate compressor (↓ (D)) in the block-diagram removes D− 1 of every D inputsamples, while the sample-rate expander (↑ (U)) inserts U − 1 zeros after every input sample. With thecompression and expansion factors set to the same value (D = U), �lters FIR 1 and FIR 3 operate at thesample rate Fs, while �lter FIR 2 operates at the lower rate of FsD .

FIR 3UFIR 2DFIR 1

Inx[n] y[n]

Out

Figure 1.7: Net multirate system

Later, you will implement the system and control the compression and expansion factors at runtime withan interface provided for you. You will be able to disable any or all of the �lters to investigate multiratee�ects. What purpose do FIR 1 and FIR 3 serve, and what would happen in their absence?

A second objective of this lab exercise is to introduce the TI-C549 C environment in a practical DSPapplication. In this lab, the C environment will be used to a limited extent to handle the basic sample inputand output. The program �ow and most of the implementation is to be done directly in assembly.

In future labs, the bene�ts of using the C environment will become clear as larger systems are developed.The C environment provides a fast and convenient way to implement a DSP system using C and assemblymodules.

17This content is available online at <http://cnx.org/content/m10858/2.6/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 46: ECE 320 - Spring 2003 - cnx.org

40 CHAPTER 1. REQUIRED LABS

1.4.2 Lab 3: Prelab (part 1)18

1.4.2.1 Multirate Theory Exercise

Consider a sampled signal with the DTFT X (ω) shown in Figure 1.8.

12�9�18

X(�)�7�18 7�18 ��2�9 ��18

Figure 1.8: DTFT of the input signal.

Assuming U = D = 3, use the relations between the DTFT of a signal before and after sample-ratecompression and expansion ((1.3) and (1.4)) to sketch the DTFT response of the signal as it passes throughthe multirate system of Figure 1.9 (without any �ltering). Include both the intermediate response W (ω)and the �nal response Y (ω). It is important to be aware that the translation from digital frequency ω toanalog frequency depends on the sampling rate. Therefore, the conversion is di�erent for X (ω) and W (ω).

W (ω) =1D

D−1∑k=0

X

(ω + 2πk

D

)(1.3)

Y (ω) = W (Uω) (1.4)

UDX(�) Y (�)W (�)Figure 1.9: Multirate System

18This content is available online at <http://cnx.org/content/m10620/2.14/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 47: ECE 320 - Spring 2003 - cnx.org

41

1.4.3 Lab 3: Prelab (part 2)19

1.4.3.1 Filter-Design Exercise

Using the zero-placement method, design the FIR �lters for the multirate system in Multirate Filtering:Introduction20. Recall that the z-transform of a length- N FIR �lter is a polynomial in z−1, and that thispolynomial can be factored into N − 1 roots.

H (z) = h0 + h1z−1 + h2z

−2 + · · ·=

(z1 − z−1

) (z2 − z−1

) (z3 − z−1

) · · · (1.5)

Use this relation to design a low-pass �lter (for the anti-aliasing and anti-imaging �lters of the multiratesystem) by placing twelve complex zeros on the unit circle at ± ( 3π

8

), ± (π2 ), ± ( 5π

8

), ± ( 3π

4

), ± ( 7π

8

), and

± (π). This �lter that you have just designed will serve for both FIR 1 and FIR 3. For �lter FIR 2 (operatingat the decimated rate), use four equally-spaced zeros on the unit circle located at ± (π4 ) and ± ( 3π

4

). Be

sure to adjust the resulting �lter coe�cients to ensure that the gain does not exceed one at any frequency.Design your �lters by writing a MATLAB script to compute the �lter coe�cients from the given zero

locations. The MATLAB function poly is very useful for this; type help poly in MATLAB for details.Once you have determined the coe�cients of the �lters, use MATLAB function freqz to plot the frequency

responses. You will �nd that the frequency response of these �lters has a large gain. Adjust the resulting �ltercoe�cients to ensure that the largest frequency gain is less than or equal to one by dividing the coe�cientsby an appropriate value. Do the frequency responses match your expectations based on the locations of thezeros in the z-plane?

At the beginning of the lab you should be prepared to show the TA your DTFT sketches of W (ω) andY (ω) as well as the frequency response plots of your designed �lters.

19This content is available online at <http://cnx.org/content/m10859/2.4/>.20"Multirate Filtering: Introduction", Figure 1 <http://cnx.org/content/m10024/latest/#�g1>

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 48: ECE 320 - Spring 2003 - cnx.org

42 CHAPTER 1. REQUIRED LABS

1.4.4 Lab 3: Lab21

1.4.4.1 Implementation

As this is your �rst experience with the C environment, most of the programming for the assignment isto be done directly in assembly. A C skeleton will provide access to input samples and a way to outputsamples. From the C skeleton, an assembly module for implementing the complete multirate system (for asingle sample) is called. In the assembly module, the downsampling and upsampling blocks are implementedby using a loop or counter to determine which samples to keep and when to insert zeros.

As there was a core �le for working in the assembly environment (Labs 0-2), there is a core �le for theC environment (V:\ece320\54x\dspclib\core.asm) which handles the interrupts from the CODEC (A/Dand D/A) and the serial port. Here, we will describe the important aspects of the core code necessary tocomplete the assignment. The complete documentation on the core code developed for the C environmentwill be made available soon.

1.4.4.1.1 C Skeleton

Let's examine the following C main program lab3main.c22 which calls assembly FIR �lter functionsinit_filter and filter.

1 #include "v:/ece320/54x/dspclib/core.h" /* Declarations for core file */

2 void init_filter(void); /* Prototypes for assembly functions */

3 int filter(int sample);

4 extern int dec_rate; /* Default decimation rate is 4 */

5

6 main()

7 {

8 int *Rcvptr,*Xmitptr; /* pointers to Xmit & Rcv Bufs */

9 int i, sample1, sample2;

10

11 init_filter(); /* Initialize the filter */

12

13 while( 1 )

14 {

15 /* Wait for a new block of samples */

16 WaitAudio(&Rcvptr,&Xmitptr);

17

18 /* Process a block of samples */

19 for( i = 0; i < BlockLen; i ++ )

20 {

21 sample1 = Rcvptr[4*i]; /* Ch1 input sample */

22 sample2 = Rcvptr[4*i+2]; /* Ch2 input sample */

23

24 Xmitptr[6*i] = sample1; /* First output is input */

25 Xmitptr[6*i+1] = filter(sample1); /* Secound output is result */

26 }

27

28 i = SerialRX(); /* Check serial port */

29 if( i > 0 )

21This content is available online at <http://cnx.org/content/m10617/2.9/>.22http://cnx.rice.edu/modules/m10617/latest/lab3main.c

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 49: ECE 320 - Spring 2003 - cnx.org

43

30 dec_rate = i; /* Save new decimation rate if we got it */

31 }

32 }

In the main program, an in�nite loop operates over the input samples accessed by the pointer Rcvptr andwrites the output samples via the pointer Xmitptr. In C, pointers may be used as array names so thatXmitptr[0] is the �rst word pointed to by Xmitptr. The function WaitAudio is a assembly function in thecore code which handles the CODEC interrupts. It returns a block of BlockLen samples and writes BlockLensamples to each of the six channels. As in the assembly core, the input samples are not in consecutive order.The right and left inputs are o�set from Rcvptr respectively by 4i and 4i+ 2, i = 0, . . ., BlockLen− 1. Thesix output channels are accessed consecutively as o�sets from Xmitptr.

1.4.4.1.2 Assembly Functions

Let's examine the calls to the assembly functions init_filter and filter. The assembly �le containingthese functions is v:\ece320\54x\dspclib\lab3filt.asm

1 ; Lab 3 assembly module

2

3 .copy "v:\ece320\54x\dspclib\core.inc"4 ; Useful macros for C interfacing

5

6 .global _dec_rate ; Decimation rate - in lab3main.c

7 .global _filter ; Filter code in this file

8 .global _init_filter

9

10 FIR_len .set 13

11

12 .sect ".data"

13

14 .align 16 ; Align to a multiple of 16

15 firstate

16 .space 16*13 ; Allocate 13 words of storage for

17 ; filter state.

18 .align 16

19 coef .copy "coef1.asm"

20

21 stateptr .word 0

22

23 _dec_rate .word 4

24

25 .sect ".text"

26

27 _init_filter ; need the leading _ for a C name.

28 ENTER_ASM

29

30 stm #firstate, AR3

31 mvmd AR3, stateptr ; Save AR3

32

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 50: ECE 320 - Spring 2003 - cnx.org

44 CHAPTER 1. REQUIRED LABS

33 LEAVE_ASM

34 ret

35

36 _filter

37 ENTER_ASM

38 ; Input in low part of A accumulator

39

40 mvdm stateptr, AR3 ; Restore state pointer

41

42 stm #FIR_len,BK ; initialize circular buffer length

43 stm #coef,AR2 ; initialize coefficient pointer

44 stm #1,AR0 ; initialize AR0 for pointer increment

45

46 stl A,*AR3+% ; store current input into state buffer

47 rptz A,(FIR_len-1) ; clear A and repeat

48 mac *AR2+0%,*AR3+0%,A ; multiply coefficient by state & accumulate

49

50 rnd A ; Round off value in 'a' to 16 bits

51

52 sfta a,-16 ; Shift output to low part of accumulator

53

54 mvmd AR3, stateptr ; Save state pointer

55

56 ; Output in low part of A accumulator

57

58 LEAVE_ASM

59 ret

The assembly �le contains two main parts, the data section starting with .sect ".data" and the programsection starting with .sect ".text". Every function and variable accessed in C must be preceded by asingle underscore _ in assembly and a .global _name must be placed in the assembly �le for linking. Inthis example, filter is an assembly function called from the C program with a label _filter in the textportion of the assembly �le and a .global _filter declaration. In each assembly function, the macroENTER_ASM is called upon entering and LEAVE_ASM is called upon exiting. These macros are de�ned inv:\ece320\54x\dspclib\core.inc. The ENTER_ASM macro saves the status registers and AR1, AR6, and AR7

when entering a function as required by the register use conventions. The ENTER_ASM macro also sets thestatus registers to the assembly conventions we have been using (i.e, FRCT=1 for fractional arithmetic andCPL=0 for DP referenced addressing). The LEAVE_ASM macro just restores the saved registers.

1.4.4.1.3 Parameter Passing

The parameter passing convention between assembly and C is simple for single input, single output assemblyfunctions. From a C program, the input to an assembly program is in the low part of accumulator A with theoutput returned in the same place. In this example, the function filter takes the right input sample fromA and returns a single output in A (note the left shift by 16 to put the result in the low part of A). Whenmore than one parameter is passed to an assembly function, the parameters are passed on the stack (see thecore �le description for more information). We suggest that you avoid passing or returning more than oneparameter. Instead, use global memory addresses to pass in or return more than one parameter. Anotheralternative is to pass a pointer to the start of a bu�er intended for passing and returning parameters.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 51: ECE 320 - Spring 2003 - cnx.org

45

1.4.4.1.4 Registers Modi�ed

When entering and leaving an assembly function, the ENTER_ASM and LEAVE_ASM macros ensure that certainregisters are saved and restored. Since the C program may use any and all registers, the state of a registercannot be expected to remain the same between calls to assembly function(s). Therefore, any informationthat needs to be preserved across calls to an assembly function must be saved to memory! Inthis example, stateptr keeps track of the location of the current sample in the circular bu�er firstate.Why don't we need to keep track of the location of the coe�cient pointer (AR2 in this example) after everysample?

1.4.4.1.5 Compiling and Linking

A working program can be produced by compiling the C code and linking assembly modules and the coremodule. The compiler translates C code to a relocatable assembly form. The linker assigns physical addresseson the DSP to the relocatable data and code segments, resolves .global references and links runtime libraries.

The procedure for compiling C code and linking assembly modules has been automated for you in thebatch �le v:\ece320\54x\dsptools\C_ASM.bat. Copy the �les lab3main.c, and lab3filt.asm from thev:\ece320\54x\dspclib\ directory into your own directory on the W: drive. Using Matlab, write thecoe�cients you created in the prelab into a coef1.asm �le. Then, type c_asm lab3main lab3filt toproduce a lab3main.out �le to be loaded onto the DSP. Load the output �le onto the DSP as usual andcheck that is the FIR �lter you designed.

1.4.4.1.6 Cascade of FIR1 and FIR2

Modify the lab3filt.asm assembly module to implement a cascade of �lters FIR1 and FIR2. Note thatboth _filter and _init_filter will need to be modi�ed. Compile and link the new assembly module andcon�rm it has the frequency response which you expect from cascading FIR1 and FIR2.

1.4.4.1.7 Complete Multirate System

Once you have the cascaded system working, implement the multirate system composed of the three FIR�lters by modifying the assembly modules in lab3filt.asm. In order to implement the sample rate convert-ers, you will need to use a counter or a loop. The upsampling block and downsampling block are notimplemented as seperate sections of code. Your counter or loop will determine when the decimatedrate processing is to occur as well as when to insert zeros into FIR3 to implement the zero-�lling up-sampler.

Some instructions that may be useful for implementing your multirate structure are the addm (addto memory) and bc (branch conditional) instructions. You may also �nd the banz (branch on auxiliaryregister not zero) instruction useful, depending on how you implement your code. As the counter is stateinformation that needs to be preserved between calls to filter, the counter must be saved inmemory.

In order to experiment with multirate e�ects in your system, make the downsampling factor (D = U) aconstant which can be changed easily in your code. Is there a critical (D = U) associated with this systemabove which aliasing occurs?

It will be useful both for debugging and for experimentation to show the output of your system atvarious points in the block diagram. By modifying the C code in lab3main.c and the assembly modules inlab3filt.asm, send the following sequences to the DSP output

• output of FIR1• input to FIR2 (after downsampling)• input to FIR3 (after upsampling)

You will have to pass these samples to the main C program by storing them in memory locations as describedin Section 1.4.4.1.3 (Parameter Passing). Note that the input to FIR2 is at the downsampled rate.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 52: ECE 320 - Spring 2003 - cnx.org

46 CHAPTER 1. REQUIRED LABS

1.4.4.1.8 Grading and Oral Quiz

For the quiz, you should be prepared to change the decimation rate upon request, and explain the e�ects ofchanging the decimation rate on the system's output.

As usual, your grade will be split up into three sections:

• 1 point: Prelab• 4 points: Code (Code which is complete and working at the beginning of the lab period gets full credit.)• 5 points: Oral Quiz

The oral quiz may cover various problems in multirate sampling theory, as well as the operation of yourcode itself and details about the instructions you've used in your code. Be prepared to explain, in detail,the operation of all of your code, even if your lab partner wrote it! You may also be asked to make changesto your code and to predict, and explain, the e�ects of these changes.

1.4.4.1.9 Extra Credit: 1 point

One of the main bene�ts of multirate systems is e�ciency. Because of downsampling, the output of FIR1 isused only one of D times. Make your assembly module more e�cient by using this fact.

Similarly, at the input of FIR3, D− 1 of every D samples is zero. So, for a �xed downsampling factor D,it is possible to make use of this fact to create D di�erent �lters (each a subset of the coe�cients of FIR3)to be used at the D time instances. This technique is referred to as polyphase �ltering and can be found inmost modern DSP textbooks. These �lters are more e�cient as the sum of the lengths of the �lters is equalto the length of FIR3. Apply this fact for D = 4.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 53: ECE 320 - Spring 2003 - cnx.org

47

1.5 Lab 4

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 54: ECE 320 - Spring 2003 - cnx.org

48 CHAPTER 1. REQUIRED LABS

1.5.1 Lab 4: Theory23

1.5.1.1 Introduction

In this lab you are going to apply the Fast Fourier Transform (FFT) to analyze the spectral contentof an input signal in real time. After computing the FFT of a 1024-sample block of input data, you willthen compute the squared magnitude of the sampled spectrum and send it to the output for display on theoscilloscope. In contrast to the systems you have implemented in the previous labs, the FFT is an algorithmthat operates on blocks of samples at a time. In order to operate on blocks of samples, you will need to useinterrupts to halt processing so that samples can be transferred.

The FFT can be used to analyze the spectral content of a signal. Recall that the FFT is an e�cientalgorithm for computing the Discrete Fourier Transform (DFT), a frequency-sampled version of theDTFT.

DFT:

X [k] =N−1∑n=0

x [n] e−(j 2πN nk) (1.6)

where n and k ∈ {0, 1, . . . , N − 1}Your implementation will include windowing of the input data prior to the FFT computation. This is

simple a point-by-point multiplication of the input with an analysis window. As you will explore in theprelab exercises, the choice of window a�ects the shape of the resulting window.

A block diagram representation of the spectrum analyzer you will implement in the lab, including therequired input and ouput locations, can be found depicted in Figure 1.10.

Figure 1.10: FFT-based spectrum analyzer

23This content is available online at <http://cnx.org/content/m10860/2.6/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 55: ECE 320 - Spring 2003 - cnx.org

49

1.5.2 Lab 4: Prelab24

1.5.2.1 MATLAB Exercise

Since the DFT is a sampled version of the spectrum of a digital signal, it has certain sampling e�ects.To explore these sampling e�ects more thoroughly, we consider the e�ect of multiplying the time signal bydi�erent window functions and the e�ect of using zero-padding to increase the length (and thus the number ofsample points) of the DFT. Using the following MATLAB script as an example, plot the squared-magnituderesponse of the following test cases over the digital frequencies ωc =

[π8 ,

3π8

].

1. rectangular window with no zero-padding2. hamming window with no zero-padding3. rectangular window with zero-padding by factor of four (i.e., 1024-point FFT)4. hamming window window with zero-padding by factor of four

Window sequences can be generated in MATLAB by using the boxcar and hamming functions.

1 N = 256; % length of test signals

2 num_freqs = 100; % number of frequencies to test

3

4 % Generate vector of frequencies to test

5

6 omega = pi/8 + [0:num_freqs-1]'/num_freqs*pi/4;

7

8 S = zeros(N,num_freqs); % matrix to hold FFT results

9

10

11 for i=1:length(omega) % loop through freq. vector

12 s = sin(omega(i)*[0:N-1]'); % generate test sine wave

13 win = boxcar(N); % use rectangular window

14 s = s.*win; % multiply input by window

15 S(:,i) = (abs(fft(s))).^2; % generate magnitude of FFT

16 % and store as a column of S

17 end

18

19 clf;

20 plot(S); % plot all spectra on same graph

21

Make sure you understand what every line in the script does. What signals are plotted?You should be able to describe the tradeo� between mainlobe width and sidelobe behavior for the various

window functions. Does zero-padding increase frequency resolution? Are we getting something for free?What is the relationship between the DFT, X [k], and the DTFT, X (ω), of a sequence x [n]?

24This content is available online at <http://cnx.org/content/m10625/2.8/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 56: ECE 320 - Spring 2003 - cnx.org

50 CHAPTER 1. REQUIRED LABS

1.5.3 Lab 4: Lab25

1.5.3.1 Implementation

As your spectrum analyzer works on a block of samples at a time, you will need to use interrupts topause your processing while samples are transferred from/to the CODEC (A/D and D/A) bu�er. For-tunately, the interrupt handling routines have been written for you in a C shell program available atv:\ece320\54x\dspclib\lab4main.c and the core code.

1.5.3.1.1 Interrupt Basics

Interrupts are an essential part of the operation of any microprocessor. They are particularly importantin embedded applications where DSPs are often used. Hardware interrupts provide a way for interactingwith external devices while the processor executes code. For example, in a key entry system, a key presswould generate a hardware interrupt. The system code would then jump to a speci�ed location in programmemory where a routine could process the key input. Interrupts provide an alternative to polling. Insteadof checking for key presses at a predetermined rate (requires a clock), the system could be busy executingother code. On the TI-C54x DSP, interrupts provide a convenient way to transfer blocks of data to/fromthe CODEC in a timely fashion.

1.5.3.1.2 Interrupt Handline

The lab4main.c and the core code are intended to make your interaction with the hardware much simpler.At the heart of this interaction is the auto-bu�ering serial port. In the auto-bu�ering serial mode, theTI-C54x processor is able to do processing uninterrupted while samples are transferred to/from a bu�erof length BlockLen = 64 samples. However, the spectrum analyzer to be implemented in this lab works overa block of N = 1024 samples. If it were possible to compute a 1024-point FFT in the sample time of oneBlockLen, then no additional interrupt handling routines would be necessary. Samples could be collected ina 1024-length bu�er and a 1024-point FFT could be computed uninterrupted while the auto-bu�ering bu�er�lls. Unfortunately, the DSP is not fast enough to accomplish this task.

We now provide an explanation of the shell C program lab4main.c listed in Appendix A (Section 1.5.3.3:Appendix A:). The lab4main.c �le contains the function interrupt void irq and a main program. Themain program is an in�nite loop over blocks of N = 1024 samples. Note that while the DSP is executinginstructions in this loop, interrupts occur every BlockLen samples. Inside the in�nite loop, you will insertcode to do the operations which follow. Although each of these operations may be performed in C orassembly, we suggest you follow the guidelines suggested.

1. Transfer inputs and outputs (C)2. Apply a Hamming Window (C/assembly)3. Bit-reverse the input (assembly)4. Apply an N -point FFT (assembly)5. Compute the magnitude-squared spectrum (C/assembly)6. Include a trigger pulse (C/assembly)

An interrupt from the CODEC occurs every BlockLen samples. The SetAudioInterrupt(irq) call inthe main program tells the core code to jump to the irq function when an interrupt occurs. In the irq

function, BlockLen samples of the A/D input in Rcvptr (channel 1) are written to a length N inputs bu�er,and BlockLen of the output samples in the outputs bu�er are written to the D/A output via Xmitptr onchannel 2. On channel 1 of the output, the input is echoed out. You are to �ll the bu�er outputs withthe windowed magnitude-squared FFT values by performing the operations listed above.

In the main code, the while(!input_full); loop waits for N samples to collect in the inputs bu�er.Next, the N inputs and outputs must be transferred. You are to write this portion of code. This portion

25This content is available online at <http://cnx.org/content/m10658/2.10/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 57: ECE 320 - Spring 2003 - cnx.org

51

of code is to be done �rst, within BlockLen sample times; otherwise the �rst BlockLen of samples ofoutput would not be available on time. Once this loop is �nished, the lengthy processing of the FFT cancontinue. During this processing, the DSP is interrupted every BlockLen samples to transfer samples. Oncethis processing is over, the in�nite loop returns to while(!input_full); to wait for N samples to �nishcollecting.

The �ow diagram in Figure 1.11 summarizes the operation of the interrupt handling routine

(a) (b)

Figure 1.11: Overall program �ow of the main function and the interrupt handling function. (a) main(b) interrupt handler

1.5.3.1.3 FFT Routine

As the list of operations indicates, bit-reversal and FFT computation are to be done in assembly. We areproviding you with a shell assembly �le, available at v:\ece320\54x\dspclib\c_fft_given.asm and shownin Appendix B (Section 1.5.3.4: Appendix B:), containing many useful declarations and some code. Thecode for performing bit-reversal and other declarations needed for the FFT routine are also provided in thissection. However, we would like you to enter this code manually, as you will be expected tounderstand its operation.

Now, we explain how to use the FFT routine provided by TI for the C54x. The FFT routine fft.asm

located in v:\ece320\54x\dsplib\ computes an in-place, complex FFT. The length of the FFT is de�nedas a label K_FFT_SIZE and the algorithm assumes that the input starts at data memory location _fft_data.To have your code assemble for an N -point FFT, you will have to include the following label de�nitions inyour assembly code.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 58: ECE 320 - Spring 2003 - cnx.org

52 CHAPTER 1. REQUIRED LABS

N .set 1024

K_FFT_SIZE .set N ; size of FFT

K_LOGN .set 10 ; number of stages (log_2(N))

In addition to de�ning these constants, you will have to include twiddle-factor tables for the FFT. Thesetables twiddle126 and twiddle227 ) are available in the shared directory v:\ece320\54x\dsplib\. Note thatthe tables are each N points long representing values from 0 to just shy of 180 degrees and must be accessedusing a circular pointer. To include these tables at the proper location in memory with the appropriatelabels referenced by the FFT, use the following

.sect ".data"

.align 1024

sine .copy "v:\ece320\54x\dsplib\twiddle1".align 1024

cosine .copy "v:\ece320\54x\dsplib\twiddle2"

The FFT provided requires that the input be in bit-reversed order, with alternating real and imaginarycomponents. Bit-reversed addressing is a convenient way to order input x [n] into a FFT so that the outputX (k) is in sequential order (i.e. X (0), X (1), . . ., X (N − 1) for an N -point FFT). The following tableillustrates the bit-reversed order for an eight-point sequence.

Input Order Binary Representation Bit-Reversed Representation Output Order

0 000 000 0

1 001 100 4

2 010 010 2

3 011 110 6

4 100 001 1

5 101 101 5

6 110 011 3

7 111 111 7

Table 1.1

The following routine performs the bit-reversed reordering of the input data. The routine assumes thatthe input is stored in data memory starting at the location labeled _bit_rev_data, which must be alignedto the least power of two greater than the input bu�er length, and consists of alternating real and imaginaryparts. Because our input data is going to be purely real in this lab, you will have to make sure that you setthe imaginary parts to zero by zeroing out every other memory location.

26http://cnx.org/content/m10658/latest/TWIDDLE127http://cnx.org/content/m10658/latest/TWIDDLE2

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 59: ECE 320 - Spring 2003 - cnx.org

53

1 bit_rev:

2 STM #_bit_rev_data,AR3 ; AR3 -> original input

3 STM #_fft_data,AR7 ; AR7 -> data processing buffer

4 MVMM AR7,AR2 ; AR2 -> bit-reversed data

5 STM #K_FFT_SIZE-1,BRC

6 RPTBD bit_rev_end-1

7 STM #K_FFT_SIZE,AR0 ; AR0 = 1/2 size of circ buffer

8 MVDD *AR3+,*AR2+

9 MVDD *AR3-,*AR2+

10 MAR *AR3+0B

11 bit_rev_end:

12 NOP

As mentioned, in the above code _bit_rev_data is a label indicating the start of the input data and_fft_data is a label indicating the start of a circular bu�er where the bit-reversed data will be written.Note that although AR7 is not used by the bit-reversed routine directly, it is used extensively in the FFTroutine to keep track of the start of the FFT data space.

In general, to have a pointer index memory in bit-reversed order, the AR0 register needs to be set toone-half the length of the circular bu�er; a statement such as ARx+0B is used to move the ARx pointer tothe next location. For more information regarding the bit-reversed addressing mode, refer to page 5-18 inthe TI-54x CPU and Peripherals manual[?]. Is it possible to bit-reverse a bu�er in place? For a diagram ofthe ordering of the data expected by the FFT routine, see Figure 4-10 in the TI-54x Applications Guide[?].Note that the FFT code uses all the pointers available and does not restore the pointers to their originalvalues.

1.5.3.1.4 Creating the Window

As mentioned, you will be using the FFT to compute the spectrum of a windowed input. For your imple-mentation you will need to create a 1024-point Hamming window. Create a Hamming window in matlabusing the function hamming, then use save_coef to save the window to a �le that can then be included inyour code with the .copy directive.

1.5.3.1.5 Displaying the Spectrum

Once the DFT has been computed, you must calculate the squared magnitude of the spectrum for display.

(|X (k) |)2 = (< (X (k)))2 + (= (X (k)))2 (1.7)

You may �nd the assembly instructions squr and squra useful in implementing (1.7).Because the squared magnitude is always nonnegative, you can replace one of the magnitude values with

a -1.0 as a trigger pulse for display on the oscilloscope. This is easily performed by replacing the DC term (k = 0) with a -1.0 when copying the magnitude values to the output bu�er. The trigger pulse is necessaryfor the oscilloscope to lock to a speci�c point in the spectrum and keep the spectrum �xed on the scope.

1.5.3.1.6 Intrinsics

If you are planning on writing some of the code in C, then you may be forced to use intrinsics. Intrinsicinstructions provide a way to use assembly instructions directly in C. An example of an intrinsic instructionis bit_rev_data[0]=_smpyr(bit_rev_data[0],window[0]) which performs the assembly signed multiply

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 60: ECE 320 - Spring 2003 - cnx.org

54 CHAPTER 1. REQUIRED LABS

round instruction. You may also �nd the _lsmpy instruction useful. For more information on intrinsics, seepage 6-22 of the TI-C54x Optimizing C/C++ Compiler User's Guide[?].

1.5.3.2 Quiz Information

From your prelab experiments, you should be able to describe the e�ect of windowing and zero-padding onFFT spectral analysis. In your DSP system, experiment with di�erent inputs, changing N and the type ofwindow. Does the (|X (k) |)2 coincide with what you expect from Matlab? What is the relationship betweenthe observed spectrum and the DTFT?

1.5.3.3 Appendix A:

lab4main.c28

1 /* v:/ece320/54x/dspclib/lab4main.c */

2 /* dgs - 9/14/2001 */

3

4 #include "v:/ece320/54x/dspclib/core.h"

5

6 #define N 1024 /* Number of FFT points */

7

8 /* Function defined by c_fft_given.asm */

9 void bit_rev_fft(void);

10

11 /* FFT data buffers (declared in c_fft_given.asm) */

12 extern int bit_rev_data[N*2]; /* Data input for bit-reverse function */

13 extern int fft_data[N*2]; /* In-place FFT & Output array */

14 extern int window[N]; /* The Hamming window */

15

16 /* Our input/output buffers */

17 int inputs[N];

18 int outputs[N];

19

20 volatile int input_full = 0; /* volatile means interrupt changes it */

21 int count = 0;

22

23 interrupt void irq(void)

24 {

25 int *Xmitptr,*Rcvptr; /* pointers to Xmit & Rcv Bufs */

26 int i;

27

28 static int in_irq = 0; /* Flag to prevent reentrance */

29

30 /* Make sure we're not in the interrupt (should never happen) */

31 if( in_irq )

32 return;

33

34 /* Mark we're processing, and enable interrupts */

35 in_irq = 1;

28http://cnx.org/content/m10658/latest/lab4main.c

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 61: ECE 320 - Spring 2003 - cnx.org

55

36 enable_irq();

37

38 /* The following waitaudio call is guaranteed not to

39 actually wait; it will simply return the pointers. */

40 WaitAudio(&Rcvptr,&Xmitptr);

41

42 /* input_full should never be true... */

43 if( !input_full )

44 {

45 for (i=0; i<BlockLen; i++)

46 {

47 /* Save input, and echo to channel 1 */

48 inputs[count] = Xmitptr[6*i] = Rcvptr[4*i];

49

50 /* Send FFT output to channel 2 */

51 Xmitptr[6*i+1] = outputs[count];

52

53 count++;

54 }

55 }

56

57 /* Have we collected enough data yet? */

58 if( count >= N )

59 input_full = 1;

60

61 /* We're not in the interrupt anymore... */

62 disable_irq();

63 in_irq = 0;

64 }

65

66 main()

67 {

68 /* Initialize IRQ stuff */

69 count = 0;

70 input_full = 0;

71 SetAudioInterrupt(irq); /* Set up interrupts */

72

73 while (1)

74 {

75 while( !input_full ); /* Wait for a data buffer to collect */

76

77 /* From here until we clear input_full can only take *

78 * BlockLen sample times, so don't do too much here. */

79

80 /* First, transfer inputs and outputs */

81

82 /* . . . i n s e r t y o u r c o d e h e r e . . . */

83

85 /* Done with that... ready for new data collection */

86 count = 0; /* Need to reset the count */

87 input_full = 0; /* Mark we're ready to collect more data */

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 62: ECE 320 - Spring 2003 - cnx.org

56 CHAPTER 1. REQUIRED LABS

88

89 /************************************************************/

90 /* Now that we've gotten the data moved, we can do the *

91 * more lengthy processing. */

92

93 /* Multiply the input signal by the Hamming window. */

94

95 /* . . . i n s e r t y o u r c o d e h e r e . . . */

96 /* o r i n a s s e m b l y */

97

98 /* Bit-reverse and compute FFT*/

99 bit_rev_fft();

100

101 /* Now, take absolute value squared of FFT */

102 /* . . . i n s e r t y o u r c o d e h e r e . . . */

103 /* o r i n a s s e m b l y */

104

105 /* Last, set the DC coefficient to -1 for a trigger pulse */

106 /* . . . i n s e r t y o u r c o d e h e r e . . . */

107 /* o r i n a s s e m b l y */

108

109 /* done, wait for next time around! */

110 }

111 }

1.5.3.4 Appendix B:

c_�t_given.asm29

1 ; v:/ece320/54x/dspclib/c_fft_given.asm

2 ; dgs - 9/14/2001

3 .copy "v:\ece320\54x\dspclib\core.inc"4

5 .global _bit_rev_data

6 .global _fft_data

7 .global _window

8

9 .global _bit_rev_fft

10

11 .sect ".data"

12

13 .align 4*N

14 _bit_rev_data .space 16*2*N ; Input to _bit_rev_fft

15

16 .align 4*N

17 _fft_data .space 16*2*N ; FFT output buffer

29http://cnx.org/content/m10658/latest/c_�t_given.asm

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 63: ECE 320 - Spring 2003 - cnx.org

57

18

19

20 ; Copy in the Hamming window

21 _window ; The Hamming window

22 .copy "window.asm"

23

24 .sect ".text"

25

26 _bit_rev_fft

27 ENTER_ASM

28

29 call bit_rev ; Do the bit-reversal.

30

31 call fft ; Do the FFT

32

33 LEAVE_ASM

34 RET

35

36 ; Copy the actual FFT subroutine.

37 fft_data .set _fft_data ; FFT code needs this.

38 .copy "v:/ece320/54x/dsplib/fft.asm"

39

40

41 ; If you need any more assembly subroutines, make sure you name them

42 ; _name, and include a ".global _name" directive at the top. Also,

43 ; don't forget to use ENTER_ASM at the beginning, and LEAVE_ASM

44 ; and RET at the end!

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 64: ECE 320 - Spring 2003 - cnx.org

58 CHAPTER 1. REQUIRED LABS

1.6 Lab 5

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 65: ECE 320 - Spring 2003 - cnx.org

59

1.6.1 Lab 5: Introduction30

1.6.1.1 Introduction

In this two week laboratory exercise, you will implement a �lter to meet a given set of speci�cations. Unlikeprevious labs, you will be graded on the basis of the e�ciency of the system that you implement on the DSP.There are two broad ways in which you are to optimize your low-pass �lter system:

1. We have left the way in which you implement the low-pass �lter system completely open and you arefree to choose the method you think will be the most e�cient. In �ltering techniques (Section 1.6.3),we describe three possibilities:

a. use of an IIR �lterb. overlap-and-add or overlap-and-save using the FFT to perform fast convolutionc. use of multi-rate and multiple �lter stages to lower the overall order of the �lters required to meet

the speci�cations.

2. Once you have decided on the overall system for your �lter implementation and have veri�ed thatit will meet the given set of �lter speci�cations through simulation, there are many opportunitiesfor optimization in the actual code that you write for the DSP. You may want to use C in yourimplementation, but consider the e�ciency penalty incurred from that choice. Use of the variousparallel instructions available on the DSP, e.g. ld||mac and st||ld, may yield greater e�ciency.Of course, classical optimizations such as loop-unrolling or the precomputation of data may improvee�ciency.

In this lab, you are required to do the following:

1. Choose two techniques described in �ltering techniques (Section 1.6.3) and answer the prelab questionsfor those two techniques.

2. Write a complete MATLAB simulation for one �ltering technique. You must demonstrate that yoursimulation meets the �lter speci�cations given in the �lter speci�cation (Section 1.6.2).

3. Implement the technique that you simulated on the DSP and optimize the system to make it as e�cientas possible. You will be graded on the e�ciency of your implementation.

A detailed break-down of how you will be graded and the various due-dates can be found in grading (Sec-tion 1.6.4).

30This content is available online at <http://cnx.org/content/m11055/2.3/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 66: ECE 320 - Spring 2003 - cnx.org

60 CHAPTER 1. REQUIRED LABS

1.6.2 Lab 5: Filter Speci�cation31

1.6.2.1 Filter Speci�cation

Depiction of the �lter speci�cations.

Figure 1.12

Filter speci�cations

fp 0.1

fs 0.12

δp 0.5 dB

δs -60 dB

∆ (τg) |∆ (τg) | ≤ 9 , |f | ≤ fpTable 1.2

Please refer to Pictoralial �lter speci�cation (Figure 1.12: Depiction of the �lter speci�cations.) for apictorial description of how the various �lter constraints correspond to the frequency response of a low-pass�lter. Note that the Pictoralial �lter speci�cation (Figure 1.12: Depiction of the �lter speci�cations.) doesnot imply that the system response must be equiripple. The constraints on the system response, to be

31This content is available online at <http://cnx.org/content/m11056/2.8/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 67: ECE 320 - Spring 2003 - cnx.org

61

discussed shortly, are on the maximum allowable peak errors. However, in general, an equiripple solution toa given �lter speci�cation will require the lowest �lter order.

The full �lter speci�cation is given in �lter speci�cations (Table 1.2: Filter speci�cations). The parametersfp and fs de�ne the end of the pass-band and the beginning of the stop-band, respectively, on the frequencyaxis that has been normalized to 1. The parameters δp and δs de�ne the maximum allowable ripple, inDecibels, in the pass-band and the stop-band, respectively. The parameter |∆ (τg) | ≤ 9 , |f | ≤ fp de�nesthe maximum allowable deviation from a group delay of 9 in the pass-band. The parameter τg is the groupdelay of the system and is de�ned by the following:

τg (ω) = −d∠ (H (ω))dx

(1.8)

The parameter ∆ (τg) is de�ned as:

∆ (τg) (ω) = max {τg (ω) | |ω| ≤ ωp } −min {τg (ω) | |ω| ≤ ωp } (1.9)

Thus, |∆ (τg) | ≤ 9 , |f | ≤ fp states that the group delay is not allowed to deviate more than 9 samplesin the pass-band. A system that has constant group delay will have a phase response that is generalizedlinear-phase. Thus, deviation from constant group delay is a measure of the deviation of the phase responsefrom linear. The MATLAB command grpdelay computes the group delay of a system as a function offrequency given the �lter coe�cients a and b.

note: For which of the three techniques discussed in �ltering techniques (Section 1.6.3) must wever�y explicitly that the group delay speci�cation is met? All of them, some of them, or none ofthem?

note: Why do we only specify the �lter coe�cients for only the positive frequencies? What arewe assuming? What does this imply about the coe�cients a and b of the low-pass �lter?

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 68: ECE 320 - Spring 2003 - cnx.org

62 CHAPTER 1. REQUIRED LABS

1.6.3 Lab 5: Prelab32

1.6.3.1 IIR Filter Design Methods

1.6.3.1.1 Overview

Implementing the narrow-band LPF using an IIR �lter is probably the most di�cult option to design butthe most straightforward to implement. One reason for the design di�culty stems from the fact that in orderto get such a sharp response poles close to the unit circle are needed. These poles can then drift outside theunit circle and the system can then become unstable when �nite precision e�ects are added. Also, perfectlylinear phase (constant group delay) cannot be realized using IIR �ltering.

There are several approaches to designing an approximately linear phase IIR �lter. For example, an IIR�lter could be run on blocks of samples in both the forward and reverse direction and the results of eachblock added together; using the �lter in both directions would cancel the nonlinear phase response of the�lter. Also, iterative design methods exist to design �lters that simultaneously minimize errors in magnitudeand group delay. Yet another approach is to design an IIR �lter which approximates the desired magnituderesponse (e.g., an elliptic �lter using the ellip command in MATLAB) and then design an IIR all-pass �lterwhich compensates for the nonlinear phase. This last approach is the one we will examine here.

1.6.3.1.2 MATLAB Filter Design Toolbox

The MATLAB Filter Design (FD) Toolbox contains algorithms used for optimal or near optimal design of�lters subject to various constraints. You can view a description of the toolbox and the functions it containsat this link33 . The FD Toolbox will be installed on the white Dell machine nearest the door in the ECE320 lab.

1.6.3.1.3 Using the Filter Design Toolbox

Although it is possible to design a very good LPF magnitude response using an elliptic �lter, we do nothave the advantage with the ellip command of being able to constrain the poles away from the unit circleto prevent instability. Fortunately, the FD Toolbox provides a command called iirlpnormc which allowsus to keep the poles within a circle of a speci�ed radius. Note that this command implements a least-p'thalgorithm. The term least-p'th signi�es that the algorithm attempts to minimize a Lp-norm error. In thecase of the magnitude response, the Lp-norm error is given as

|H|p =1

∫ π

−π|H (ω)−Hd (ω) |pW (ω) dω

1p

(1.10)

where H is the actual frequency response, Hd is the desired response, and W is some weighting function.Most of the time the weighting function used is one which equals 1 over the passband and stop-band and 0 inthe transition band. The role of W in the above equation is similar to that used in FIR �lter design (remez).The relative weights in the stop-band and pass-band are set by W and control the relative magnitude of theripples in these bands. Note that minimizing the L2-norm is equivalent to minimizing the RMS error in themagnitude. In contrast, the L∞-norm is equivalent to minimizing the maximum error over the frequenciesof interest (why?). In this lab we are concerned with minimizing the L∞-norm. Of course, we cannot usein�nity in any of our computations so using a large number (e.g. 128) must su�ce.

Once our magnitude response has been selected, we need to perform group delay equalization to yieldapproximately constant group delay. This can be done using the iirgrpdelay command in the FD Toolbox.Again, note that a least-pth algorithm is used and that we can constrain the radius of the poles. The resultingall-pass �lter can be connected in series with the nonlinear phase low-pass �lter created with iirlpnormc tocomplete the entire system.

32This content is available online at <http://cnx.org/content/m11057/2.6/>.33http://www.mathworks.com/access/helpdesk/help/toolbox/�lterdesign/�lterdesign.shtml

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 69: ECE 320 - Spring 2003 - cnx.org

63

The FD Toolbox can also aid in analyzing quantization e�ects. We suggest using FDATool, a convenientGUI for interacting with the FD Toolbox, to carry out the analysis. See the internet documentation formore information on the functions available. Of course, you may choose to do this by scaling and roundingas you have done in previous labs. Note that even though MATLAB uses high-precision arithmetic you may�nd that for long IIR �lters MATLAB has di�culty rendering frequency responses, etc. Thus, you may�nd it useful to design a �lter that has half the passband ripple, half the stop-band attenuation, etc. andimplement it twice in your code to meet the speci�cation.

Note that FDAtool and the �lter design toolbox (qfilt function) can be used to analyze quantizatione�ects on various �lter structures, as well as on the FFT. The quantization parameters can be chosen andoptimized in FDAtool. Also, FDAtool (with or without the �lter design toolbox) can compute correct scalingto avoid over�ow.

1.6.3.1.4 Implementation Structures

There are several ways to implement an IIR �lter. One of these, a cascade of second-order systems, wehave already seen. An alternative is placing these second-order sections in parallel. Another commonimplementation is a lattice structure (see any standard DSP textbook), which tends to be more resistant to�nite word-length e�ects and may be more computationally e�cient. To examine your choices, as a startingpoint you should examine the MATLAB functions listed as "Linear System Transformations" when you typehelp signal at the command prompt (does not require the FD Toolbox).

One of the di�cult aspects of an IIR lattice is that although the lattice coe�cients are in the interval(-1,1), the internals of the lattice can grow to be prohibitively large. To compensate for this, an m-�le(latcfiltn.m) has been created that performs normalization after each lattice section to prevent over�ow.If you are interested in exploring a lattice implementation you may want to copy this m-�le to your owndirectory and modify it to suit your needs. Note that there are comments within the �le to indicate whereyou might add checks for over�ow conditions.

We suggest the use of FDAtool and dfilt for structure transformations. The function dfilt worksalso without the Filter Design Toolbox. It is also useful for evaluating cascade or parallel connections ofsub-�lters. The MATLAB command fvtool can be used to quickly evaluate frequency response of various�lter structures.

Many extremely e�cient structures for IIR �lter implementations exist. Two of special note are thefollowing:

• All-pass �lter implementations with N multiplies for an order N �lter (instead of 2N multiplies fora cascade realization). These are structurally all-pass, meaning that they remain all-pass even whentheir coe�cients are quantized. (S. Mitra, Digital Signal Processing, a Computer Based Approach,2nd Ed., pp. 378-382).

• Parallel all-pass Realization of IIR transfer functions with N multiplies for an N 'th order �lter. (S.Mitra, Digital Signal Processing, a Computer Based Approach, 2nd Ed., pp. 401-405, and Sec. 9.9,pp. 629-633).

1.6.3.1.5 Questions

1. Generate an elliptic LPF using the command: [b,a]=ellip(4,.5,10,.1); Using MATLAB com-mands, generate a cascaded second-order system implementation and a lattice implementation (don'tworry about normalization if you don't want) of this system and compare their advantages and disad-vantages - especially as they relate to implementation on the C5400.

2. How close to the unit circle are the poles of the system from question 1? Does this concern you?Explore how much the poles moved in the 2 implementations of part 1.

3. Use the grpdelay command to view the group delay for the �lter in the passband. Is it approximatelylinear?

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 70: ECE 320 - Spring 2003 - cnx.org

64 CHAPTER 1. REQUIRED LABS

4. Why does minimizing the $L$-norm equate to minimizing the L∞-norm maximum error over a givenfrequency range?

1.6.3.1.6 Simulation

Simulate the IIR system in MATLAB. Compute the response of the system with appropriate test inputs.Make sure to include side-e�ects due to �nite precision in your simulation.

1.6.3.2 Multi-rate/Multi-stage

1.6.3.2.1 Reading Exercise

Read through the following resources:

• "Optimum FIR Digital Filter Implementations for Decimation, Interpolation, and Narrow-Band Fil-tering," by Crochiere and Rabiner. This is colloquial paper on the topic of multi-stage �lter imple-mentation. The paper is available here34 .

• Course notes on multi-stage �lter implementation by Prof. Mark Fowler35 from Binghamton University.The notes are available here36 .

1.6.3.2.2 Design Exercises

Given the �lter speci�cation given in the �lter speci�cation (Section 1.6.2), answer the following questions:

• What is the maximum decimation factor that can be used?• What is the average number of MACs per input sample that are required for a single stage implemen-

tation?• What are the appropriate decimation and interpolation factors for a a two stage implementation?• What are the appropriate pass-band and stop-band frequencies and maximum ripple for the overall

�lter at each stage? Your answer will demonstrate that the use of multiple �lter stages along withmulti-rate signal processing can achieve a overall �lter of lower order than just a single stage �lter.

• Estimate the �lter order for each stage. We recommend using the MATLAB command remezord.This algorithm frequently underestimates the �lter order needed, but gives you a good starting point.Verify that the �lter speci�cations are met, i.e. pass-band and stop-band ripple and pass-band and stop-band band edge locations. Do this by passing the arguments returned by remezord to the MATLABcommand remez. Observe the frequency response of the system described by the �lter coe�cientsreturned by \texttt{remez} using the MATLAB command freqz. If the speci�cations are not met,increase the order of the �lter until the speci�cations are met.

• Determine the average number of MACs per sample for the two-stage implementation. Which is moree�cient, the single stage approach or the multistage approach?

1.6.3.2.3 Matlab Simulation

Using your results from the previous part, simulate the two-stage multi-rate �lter in MATLAB. Plot thefrequency response of each stage's �lter using freqz and determine the overall frequency response of yourmulti-rate system to verify that it meets the speci�cations. Since there is not a command for directly �ndingthe frequency response plot of a multi-rate system in MATLAB, you will have to be a bit creative.

34http://www.ews.uiuc.edu/∼ece320/crochiererabiner.pdf35http://www.ee.binghamton.edu/fowler/36http://www.ews.uiuc.edu/∼ece320/multistage.pdf

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 71: ECE 320 - Spring 2003 - cnx.org

65

1.6.3.2.4 Additional Questions (optional, but for your bene�t)

• Does it make a di�erence in which order the two decimations are done in a two-stage implementation?• Could / would you add additional stages? Why or why not?• Are quantization e�ects more or less pronounced in the multi-stage implementation compared to a

direct implementation? Why or why not?

1.6.3.3 Fourier-Based Filtering Methods

It is possible to perform linear convolution quickly using the FFT. This idea allows for the e�cient imple-mentation of a FIR �lter when the number of �lter coe�cients and the length of the input sequences arelarge.

1.6.3.3.1 Questions

• Read Lecture 49 of the ECE 310 Course Notes on "Block Convolution." This lecture provides anexcellent overview of two methods for e�ciently performing convolution using the FFT: "Overlap andAdd" and "Overlap and Save." For a more in depth treatment of these methods, refer to Discrete-TimeSignal Processing by Alan Oppenheim and Ronald Schafer.

• Simulate both an (1) overlap and add and an (2) overlap and save �ltering implementation in MATLAB.Your simulations should work for any choice of an FIR �lter. The �lter length M and block length Lshould be variable parameters.

• Verify that your simulated systems are working properly by comparing their performance with a directFIR implementation. Test using several FIR �lter designs and appropriate test inputs.

• Derive expressions for the amount of computation (in terms of multiply accumulates) required per inputsample for both the overlap and add and overlap and save implementations. Plot the computation persample as a function of the input block length (for a particular �lter size M) for both schemes. Is therea value of M for which the Direct FIR is always more e�cient? Derive an expression for the optimalblock size L in terms of the �lter length M for both implementations.

• In the DSP implementation, the input sequence is purely real. The values of the imaginary componentsare all set to zero. We can speed up the implementations further by exploiting the symmetry propertiesof the Fourier transform. These properties are stated as follows:

DFT (< (x (n))) = Even (X (ω)) (1.11)

DFT (j= (x (n))) = Odd (X (ω)) (1.12)

Using these properties, determine how to get two FFT's for the price of one. Implement this schemein MATLAB, and verify that the operation is correct.

• Design a FIR �lter that meets the �lter speci�cation given in the �lter speci�cation (Section 1.6.2).Lecture 38 of the ECE 310 notes on "Parks-McClellan" might be a good reference here. Design ane�cient implementation of this �lter using the methods you explored above. The MATLAB commandsremezord and remez may be of great help. Simulate this implementation in MATLAB, programmingin such a way that you can easily convert your MATLAB simulation to assembly. Find the number ofcomputations per input for your method.

• What are the bene�ts and trade-o�s of using the Fourier-based method in terms of accuracy of the�lter speci�cation, �nite precision errors, and computational expense? Compare with the IIR andmulti-rate �lter implementations.

Be prepared to show all the necessary plots and MATLAB simulations as well as answers to all of thequestions posed above to your T.A. as your prelab.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 72: ECE 320 - Spring 2003 - cnx.org

66 CHAPTER 1. REQUIRED LABS

1.6.3.3.2 Implementation Issues

Due to the limitations of the core �le, it is not possible to take in more than 64 input samples from the A/Dconverter at a time (unless the core �le is rewritten to accomplish this task). Therefore, when implementinga Fourier-based �lter, you should use the C skeleton from Lab 4 to perform the FFT on a large block ofsamples. All of your �ltering operations (i.e., the multiplications of DFT samples, the additions of theoverlap, the discarding of samples) and function calls must be performed in assembly. You will be gradedon the number of cycles per input sample based on the portion of code in your assembly routine.

You should use the fft.asm routine provided in Lab 4 to perform the forward and reverse FFT's. Youshould study this �le to determine how it works. If you need to change the length of the FFT, you will�rst need to change the relevant parameters in your assembly �le (i.e., N, K_FFT_SIZE, K_LOGN, and othervariables). You will also need to change the following parameters in the FFT �le:

K_TWID_TBL_SIZE

K_TWID_IDX_3

K_TWID_TBL_SIZE is the size of the twiddle tables (how long should these be for a given FFT length?) andK_TWID_IDX_3 is the amount by which the program increments through the twiddle table during at the thirdstage of the FFT. What is this increment for a given N? Is fft.asm a decimation in time or decimation infrequency algorithm?

You will also need a modi�ed twiddle table when you change the length of the FFT to use fft.asm aswritten. For a length 1024 FFT, the twiddle tables are length 512 each. TWIDDLE1 is a table of sine valuesfrom zero to π, and TWIDDLE2 is a table of cosine values from π

2 to 3π2 . The support for the cosine and sine

is di�erent because fft.asm code uses the fact that sin (−θ) = −sin (θ) when performing computations. Ifyou want a length 64 FFT, you will need to �decimate� the twiddle table to length 32, or in other words,only keep one out of every 16 lines in the twiddle tables and discard the rest. We will provide a MATLABfunction, edit_twiddle.m for this purpose. The function call in this example would be:

edit_twiddle('TWIDDLE1','new_twiddle1',16)

You should verify that the new twiddle tables you generate indeed have 32 elements. To perform an inverseFFT, you can use the standard FFT algorithm and then appropriately scale and shift the outputs. Lecture43 of the ECE 310 notes on the Discrete Fourier Transform suggests how this may be done (Property 3 of�Properties of the DFT�).

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 73: ECE 320 - Spring 2003 - cnx.org

67

1.6.4 Lab 5: Grading37

1.6.4.1 Obtaining a Cycle Count

Obtaining a cycle count for a given section of code is easily accomplished with the aid of Code Composer.Here is the procedure for obtaining a cycle count:

1. Load the .out �le of the program you wish to pro�le into Code Composer.2. Set a breakpoint at the beginning of the section of code that you want to pro�le. This can be easily

done by right-clicking on the line of code beginning the section and choosing "Toggle breakpoint" fromthe list of options.

3. Set a breakpoint at the end of the the section of code that you want to pro�le.4. Open the "Pro�ler" menu in Code Composer. Choose "View Clock." A window should open at the

bottom of the Code Composer window that indicates the current value of the cycle counter.5. Again, open the "Pro�ler" menu in Code Composer and choose "Enable Clock."6. Choose "Run" from the "Debug" menu to execute your code. Execution will halt at the �rst breakpoint.

Double-click the cycle counter to reset its value to zero (double click the number next to "Clock =").7. Again, choose "Run" from the "Debug" menu to continue execution of your code. Execution will halt

at the second breakpoint and the cycle counter will display the number of CPU cycles that were neededto execute the section of code delimited by the breakpoints.

1.6.4.2 Grading Information

This is a two week lab. Your prelab is due a week after the quiz for Lab 4 and the quizzing occurs two weeksafter the quiz for Lab 4.

The following details how the 10 points for the lab will be assigned:

• 1 point: Prelab. You must answer the prelab questions for two �ltering techniques in Filtering Tech-niques (Section 1.6.3). You must fully simulate one �ltering technique in MATLAB and demonstratethat the system will meet the speci�cations as given in the �lter speci�cation (Section 1.6.2). Finiteprecision e�ects such as coe�cient quantization must be modeled in your simulation. You must providethe response of your simulated system to appropriate test inputs.

• 2 points: Working code. Code that does not meet the speci�cation is not considered working code.You must be prepared to demonstrate that your code meets the speci�cation by showing the systemresponse to a frequency swept sine input. We will check that the ripple speci�cations have been metin the pass-band and the stop-band. We will also check the for the locations of the pass-band andstop-band edges to make sure that the width of the transition band is met. We will also check to seethat the phase constraint given in the speci�cation is met. We will also check to ensure that you havescaled the input signal appropriately so that clipping does not occur at the output for any frequency.Additionally, we will check to make sure that you do not scale the input too conservatively, speci�cally,a 1 V amplitude sinusoidal input is not allowed to drop below 0.5 V in amplitude at the output for allfrequencies in the pass-band.

• 2 points: Oral quiz.• 5 points: Optimization. These points will be assigned based on your cycle count. You will be judged

relative to your peers. Code that does not meet the speci�cation will be penalized. After all the cyclecounts have been collected, we will order them for scoring in two ways. First, the cycle counts willbe ordered globally over the whole class. The groups with the lowest numbered cycle counts will begiven the maximum number of points and the groups with the highest numbered cycle counts will begiven the minimum number of points. The second ordering will be done locally relative to which ofthe three �ltering techniques the groups have picked. The groups using the same �ltering techniquewill be ordered by cycle count, with the groups with the lowest cycle counts receiving maximum points

37This content is available online at <http://cnx.org/content/m11058/2.4/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 74: ECE 320 - Spring 2003 - cnx.org

68 CHAPTER 1. REQUIRED LABS

and the groups with the highest cycle counts receiving minimum points. 3 points will be assigned forthe global ordering of groups and 2 points for the local ordering with respect to �ltering technique ofgroups.

1.6.4.3 Pizza Competition

Your �nal assembly code and/or C source code for the pizza competition must be emailed [email protected] no later than 11:59 PM on Monday, March 17. However, your optimization gradewill be assigned based on the code turned in during your assigned lab section and is subject to the usualpolices for late code.

[email protected]

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 75: ECE 320 - Spring 2003 - cnx.org

Chapter 2

Project Labs

2.1 Digital Receiver

2.1.1 Digital Receiver: Symbol-Timing Recovery for QPSK1

2.1.1.1 Introduction

This receiver exercise introduces the primary components of a QPSK receiver with speci�c focus on symbol-timing recovery. In a receiver, the received signal is �rst coherently demodulated and low-pass �ltered (seeDigital Receivers: Carrier Recovery for QPSK (Section 2.1.2)) to recover the message signals (in-phase andquadrature channels). The next step for the receiver is to sample the message signals at the symbol rate anddecide which symbols were sent. Although the symbol rate is typically known to the receiver, the receiverdoes not know when to sample the signal for the best noise performance. The objective of the symbol-timingrecovery loop is to �nd the best time to sample the received signal.

Figure 2.1 illustrates the digital receiver system. The transmitted signal coherently demodulated withboth a sine and cosine, then low-pass �ltered to remove the double-frequency terms, yielding the recovered

in-phase and quadrature signals,^sI [n] and

^sQ [n]. These operations are explained in Digital Receivers:

Carrier Recovery for QPSK (Section 2.1.2). The remaining operations are explained in this module. Bothbranches are fed through a matched �lter and re-sampled at the symbol rate. The matched �lter is simplyan FIR �lter with an impulse response matched to the transmitted pulse. It aids in timing recovery andhelps suppress the e�ects of noise.

1This content is available online at <http://cnx.org/content/m10485/2.14/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

69

Page 76: ECE 320 - Spring 2003 - cnx.org

70 CHAPTER 2. PROJECT LABS

Figure 2.1: Digital receiver system

If we consider the square wave shown in Figure 2.2 as a potential recovered in-phase (or quadrature) signal(i.e., we sent the data [+1,−1,+1,−1, . . . ]) then sampling at any point other than the symbol transitionswill result in the correct data.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 77: ECE 320 - Spring 2003 - cnx.org

71

Figure 2.2: Clean BPSK waveform.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 78: ECE 320 - Spring 2003 - cnx.org

72 CHAPTER 2. PROJECT LABS

Figure 2.3: Noisy BPSK waveform.

However, in the presence of noise, the received waveform may look like that shown in Figure 2.3. In thiscase, sampling at any point other than the symbol transitions does not guarantee a correct data decision.By averaging over the symbol duration we can obtain a better estimate of the true data bit being sent (+1or −1). The best averaging �lter is the matched �lter, which has the impulse response u [n]− u [n− Tsymb],where u [n] is the unit step function, for the simple rectangular pulse shape used in Digital Transmitter:Introduction to Quadrature Phase-Shift Keying2. 3Figure 2.4 and Figure 2.5 show the result of passing boththe clean and noisy signal through the matched �lter.

2"Digital Transmitter: Introduction to Quadrature Phase-Shift Keying" <http://cnx.org/content/m10042/latest/>3For digital communications schemes involving di�erent pulse shapes, the form of the matched �lter will be di�erent. Refer

to the listed references for more information on symbol timing and matched �lters for di�erent symbol waveforms.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 79: ECE 320 - Spring 2003 - cnx.org

73

Figure 2.4: Averaging �lter output for clean input.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 80: ECE 320 - Spring 2003 - cnx.org

74 CHAPTER 2. PROJECT LABS

Figure 2.5: Averaging �lter output for noisy input.

Note that in both cases the output of the matched �lter has peaks where the matched �lter exactly linesup with the symbol, and a positive peak indicates a +1 was sent; likewise, a negative peak indicates a −1was sent. Although there is still some noise in second �gure, the peaks are relatively easy to distinguishand yield considerably more accurate estimation of the data (+1 or −1) than we could get by sampling theoriginal noisy signal in Figure 2.3.

The remainder of this handout describes a symbol-timing recovery loop for a BPSK signal (equivalentto a QPSK signal where only the in-phase signal is used). As with the above examples, a symbol period ofTs = 16 samples is assumed.

2.1.1.1.1 Early/late sampling

One simple method for recovering symbol timing is performed using a delay-locked loop (DLL). Figure 2.6is a block diagram of the necessary components.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 81: ECE 320 - Spring 2003 - cnx.org

75

Figure 2.6: DLL block diagram.

Consider the sawtooth waveform shown in Figure 2.4, the output of the matched �lter with a squarewave as input. The goal of the DLL is to sample this waveform at the peaks in order to obtain the bestperformance in the presence of noise. If it is not sampling at the peaks, we say it is sampling too early ortoo late.

The DLL will �nd peaks without assistance from the user. When it begins running, it arbitrarily selectsa sample, called the on-time sample, from the matched �lter output. The sample from the time-index onegreater than that of the on-time sample is the late sample, and the sample from the time-index one lessthan that of the on-time sample is the early sample. Figure 2.7 shows an example of the on-time, late,and early samples. Note in this case that the on-time sample happens to be at a peak in the waveform.Figure 2.8 and Figure 2.9 show examples in which the on-time sample comes before a peak and after thepeak.

The on-time sample is the output of the DLL and will be used to decide the data bit sent. To achieve thebest performance in the presence of noise, the DLL must adjust the timing of on-time samples to coincidewith peaks in the waveform. It does this by changing the number of time-indices between on-time samples.There are three cases:

1. In Figure 2.7, the on-time sample is already at the peak, and the receiver knows that peaks are spacedby Tsymb samples. If it then takes the next on-time sample Tsymb samples after this on-time sample, itwill be at another peak.

2. In Figure 2.8, the on-time sample is too early. Taking an on-time sample Tsymb samples later willbe too early for the next peak. To move closer to the next peak, the next on-time sample is takenTsymb + 1 samples after the current on-time sample.

3. In Figure 2.9, the on-time sample is too late. Taking an on-time sample Tsymb samples later will betoo late for the next peak. To move closer to the next peak, the next on-time sample is taken Tsymb−1samples after the current on-time sample.

The o�set decision block uses the on-time, early, and late samples to determine whether sampling is at apeak, too early, or too late. It then sets the time at which the next on-time sample is taken.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 82: ECE 320 - Spring 2003 - cnx.org

76 CHAPTER 2. PROJECT LABS

Figure 2.7: Sampling at a peak.

Figure 2.8: Sampling too early.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 83: ECE 320 - Spring 2003 - cnx.org

77

Figure 2.9: Sampling too late.

The input to the o�set decision block is on− time (late− early), called the decision statistic. Convinceyourself that when the decision statistic is positive, the on-time sample is too early, when it is zero, theon-time sample is at a peak, and when it is negative, the on-time sample is too late. It may help to refer toFigure 2.7, Figure 2.8, and Figure 2.9. Can you see why it is necessary to multiply by the on-time sample?

The o�set decision block could adjust the time at which the next on-time sample is taken based onlyon the decision statistic. However, in the presence of noise, the decision statistic becomes a less reliableindicator. For that reason, the DLL adds many successive decision statistics and corrects timing only if thesum exceeds a threshold; otherwise, the next on-time sample is taken Tsymb samples after the current on-time sample. The assumption is that errors in the decision statistic caused by noise, some positive and somenegative, will tend to cancel each other out in the sum, and the sum will not exceed the threshold because ofnoise alone. On the other hand, if the on-time sample is consistently too early or too late, the magnitude ofthe added decision statistics will continue to grow and exceed the threshold. When that happens, the o�setdecision block will correct the timing and reset the sum to zero.

2.1.1.1.2 Sampling counter

The symbol sampler maintains a counter that decrements every time a new sample arrives at the output ofthe matched �lter. When the counter reaches three, the matched-�lter output is saved as the late sample,when the counter reaches two, the matched-�lter output is saved as the on-time sample, and when thecounter reaches one, the matched-�lter output is saved as the early sample. After saving the early sample,the counter is reset to either Tsymb − 1, Tsymb, or Tsymb + 1, according to the o�set decision block.

2.1.1.2 MATLAB Simulation

Because the DLL requires a feedback loop, you will have to simulate it on a sample-by-sample basis inMATLAB.

Using a square wave of period 32 samples as input, simulate the DLL system shown in Figure 2.6. Yourinput should be several hundred periods long. What does it model? Set the decision-statistic sum-thresholdto 1.0; later, you can experiment with di�erent values. How do you expect di�erent thresholds to a�ect theDLL?

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 84: ECE 320 - Spring 2003 - cnx.org

78 CHAPTER 2. PROJECT LABS

Figure 2.10 and Figure 2.11 show the matched �lter output and the on-time sampling times (indicatedby the impulses) for the beginning of the input, before the DLL has locked on, as well as after 1000 samples(about 63 symbols' worth), when symbol-timing lock has been achieved. For each case, note the distancebetween the on-time sampling times and the peaks of the matched �lter output.

Figure 2.10: Symbol sampling before DLL lock.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 85: ECE 320 - Spring 2003 - cnx.org

79

Figure 2.11: Symbol sampling after DLL lock.

2.1.1.3 DSP Implementation

Once your MATLAB simulation works, DSP implementation is relatively straightforward. To test yourimplementation, you can use the function generator to simulate a BPSK waveform by setting it to a squarewave of the correct frequency for your symbol period. You should send the on-time sample and the matched-�lter output to the D/A to verify that your system is working.

2.1.1.4 Extensions

As your �nal project will require some modi�cation to the discussed BPSK signaling, you will want to referto the listed references, (see Proakis[9] and Blahut[1], and consider some of the following questions regardingsuch modi�cations:

• How much noise is necessary to disrupt the DLL?• What happens when the symbol sequence is random (not simply [+1,−1,+1,−1, . . . ])?• What would the matched �lter look like for di�erent symbol shapes?

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 86: ECE 320 - Spring 2003 - cnx.org

80 CHAPTER 2. PROJECT LABS

• What other methods of symbol-timing recovery are available for your application?• How does adding decision statistics help suppress the e�ects of noise?

2.1.2 Digital Receiver: Carrier Recovery4

2.1.2.1 Introduction

After gaining a theoretical understanding of the carrier recovery sub-system of a digital receiver, you willsimulate the sub-system in MATLAB and implement it on the DSP. The sub-system described is speci�callytailored to a non-modulated carrier. A complete implementation will require modi�cations to the designpresented.

The phase-locked loop (PLL) is a critical component in coherent communications receivers that isresponsible for locking on to the carrier of a received modulated signal. Ideally, the transmitted carrierfrequency is known exactly and we need only to know its phase to demodulate correctly. However, due toimperfections at the transmitter, the actual carrier frequency may be slightly di�erent from the expectedfrequency. For example, in the QPSK transmitter of Digital Transmitter: Introduction to QuadraturePhase-Shift Keying5, if the digital carrier frequency is π

2 and the D/A is operating at 44.1 kHz, then the

expected analog carrier frequency is fc =π2

2π44.1 = 11.25kHz. If there is a slight change to the D/A samplerate (say fc = 44.05kHz), then there will be a corresponding change in the actual analog carrier frequency(fc = 11.0125kHz).

This di�erence between the expected and actual carrier frequencies can be modeled as a time-varyingphase. Provided that the frequency mismatch is small relative to the carrier frequency, the feedback controlof an appropriately calibrated PLL can track this time-varying phase, thereby locking on to both the correctfrequency and the correct phase.

PhaseDetector

NCO K FilterLoop

FilterLow-pass

FilterLow-passzI [n]

zQ[n] yQ[n]

yI [n]

sin(ω̂cn)

cos(ω̂cn)

x[n] = cos(ω̃cn)

Figure 2.12: PLL Block Diagram

4This content is available online at <http://cnx.org/content/m10478/2.16/>.5"Digital Transmitter: Introduction to Quadrature Phase-Shift Keying" <http://cnx.org/content/m10042/latest/>

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 87: ECE 320 - Spring 2003 - cnx.org

81

2.1.2.1.1 Numerically controlled oscillator

In a complete coherent receiver implementation, carrier recovery is required since the receiver typically doesnot know the exact phase and frequency of the transmitted carrier. In an analog system this recovery isoften implemented with a voltage-controlled oscillator (VCO) that allows for precise adjustment of thecarrier frequency based on the output of a phase-detecting circuit.

In our digital application, this adjustment is performed with a numerically-controlled oscillator(NCO) (see Figure 2.12). A simple scheme for implementing an NCO is based on the following re-expressionof the carrier sinusoid:

sin (ωcn+ θc) = sin (θ [n]) (2.1)

where θ [n] = ωcn+θc (ωc and θc represent the carrier frequency and phase, respectively). Convince yourselfthat this time-varying phase term can be expressed as θ [n] =

∑nm=0 ωc + θc and then recursively as

θ [n] = θ [n− 1] + ωc (2.2)

The NCO can keep track of the phase, θ [n], and force a phase o�set in the demodulating carrier byincorporating an extra term in this recursive update:

θ [n] = θ [n− 1] + ωc + dpd [n] (2.3)

where dpd [n] is the amount of desired phase o�set at time n. (What would dpd [n] look like to generate afrequency o�set?)

2.1.2.1.2 Phase detector

The goal of the PLL is to maintain a demodulating sine and cosine that match the incoming carrier. Supposeωc is the believed digital carrier frequency. We can then represent the actual received carrier frequency asthe expected carrier frequency with some o�set, ω̃c = ωc + θ̃ [n]. The NCO generates the demodulating sineand cosine with the expected digital frequency ωc and o�sets this frequency with the output of the loop

�lter. The NCO frequency can then be modeled as^ωc= ωc+

^θ [n]. Using the appropriate trigonometric

identities 6, the in-phase and quadrature signals can be expressed as

z0 [n] = 1/2

(cos

(θ̃ [n]− ^

θ [n]

)+ cos

(2ωc + θ̃ [n] +

^θ [n]

))(2.4)

zQ [n] = 1/2

(sin

(θ̃ [n]− ^

θ [n]

)+ sin

(2ωc + θ̃ [n] +

^θ [n]

))(2.5)

After applying a low-pass �lter to remove the double frequency terms, we have

y1 [n] = 1/2cos

(θ̃ [n]− ^

θ [n]

)(2.6)

yQ [n] = 1/2sin

(θ̃ [n]− ^

θ [n]

)(2.7)

Note that the quadrature signal, zQ [n], is zero when the received carrier and internally generated wavesare exactly matched in frequency and phase. When the phases are only slightly mismatched we can use therelation

sin (θ) ' θ , small (2.8)

6cos (A) cos (B) = 1/2 (cos (A−B) + cos (A+B)) and cos (A) sin (B) = 1/2 (sin (B −A) + sin (A+B)).

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 88: ECE 320 - Spring 2003 - cnx.org

82 CHAPTER 2. PROJECT LABS

and let the current value of the quadrature channel approximate the phase di�erence: zQ [n] ' θ̃ [n]− ^θ [n].

With the exception of the sign error, this di�erence is essentially how much we need to o�set our NCOfrequency7. To make sure that the sign of the phase estimate is right, in this example the phase detector issimply negative one times the value of the quadrature signal. In a more advanced receiver, information fromboth the in-phase and quadrature branches is used to generate an estimate of the phase error.8

2.1.2.1.3 Loop �lter

The estimated phase mismatch estimate is fed to the NCO via a loop �lter, often a simple low-pass �lter.For this exercise you can use a one-tap IIR �lter,

y [n] = βx [n] + αy [n− 1] (2.9)

To ensure unity gain at DC, we select β = 1− αIt is suggested that you start by choosing α = 0.6 and K = 0.15 for the loop gain. Once you have a

working system, investigate the e�ects of modifying these values.

2.1.2.2 MATLAB Simulation

Simulate the PLL system shown in Figure 2.12 using MATLAB. As with the DLL simulation, you will haveto simulate the PLL on a sample-by-sample basis.

Use (2.3) to implement your NCO in MATLAB. However, to ensure that the phase term does not growto in�nity, you should use addition modulo 2π in the phase update relation. This can be done by settingθ [n] = θ [n]− 2π whenever θ [n] > 2π.

Figure 2.13 illustrates how the proposed PLL will behave when given a modulated BPSK waveform. Inthis case the transmitted carrier frequency was set to ω̃c = π

2 + π1024 to simulate a frequency o�set.

7If θ̃ [n]− ^θ [n] > 0 then

^θ [n] is too large and we want to decrease our NCO phase.

8What should the relationship between the I and Q branches be for a digital QPSK signal?

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 89: ECE 320 - Spring 2003 - cnx.org

83

100 200 300 400 500 600 700 800−1

−0.8

−0.6

−0.4

−0.2

0

0.2

0.4

0.6

0.8

1In−phase Quadrature

Figure 2.13: Output of PLL sub-system for BPSK modulated carrier.

Note that an amplitude transition in the BPSK waveform is equivalent to a phase shift of the carrier byπ2 . Immediately after this phase change occurs, the PLL begins to adjust the phase to force the quadraturecomponent to zero (and the in-phase component to 1/2). Why would this phase detector not work in a realBPSK environment? How could it be changed to work?

2.1.2.3 DSP Implementation

As you begin to implement your PLL on the DSP, it is highly recommended that you implement and testyour NCO block �rst before completing the rest of your phase-locked loop.

2.1.2.3.1 Sine-table interpolation

Your NCO must be able to produce a sinusoid with continuously variable frequency. Computing values ofsin (θ [n]) on the �y would require a prohibitive amount of computation and program complexity; a look-uptable is a better alternative.

Suppose a sine table stores N samples from one cycle of the waveform: sin(

2πkN

), k = {0, . . . , N − 1} .

Sine waves with discrete frequencies ω = 2πN p are easily obtained by outputting every pth value in the table

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 90: ECE 320 - Spring 2003 - cnx.org

84 CHAPTER 2. PROJECT LABS

(and using circular addressing). The continuously variable frequency of your NCO will require non-integerincrements, however. This raises two issues: First, what sort of interpolation should be used to get thein-between sine samples, and second, how to maintain a non-integer pointer into the sine table.

You may simplify the interpolation problem by using "lower-neighbor" interpolation, i.e., by using theinteger part of your pointer. Note that the full-precision, non-integer pointer must be maintained in memoryso that the fractional part is allowed to accumulate and carry over into the integer part; otherwise, yourphase will not be accurate over long periods. For a long enough sine table, this approximation will adjustthe NCO frequency with su�cient precision.9

Maintaining a non-integer pointer is more di�cult. In earlier exercises, you have used the auxiliaryregisters (ARx) to manage pointers with integer increments. The auxiliary registers are not suited for thenon-integer pointers needed in this exercise, however, so another method is required. One possibility is toperform addition in the accumulator with a modi�ed decimal point. For example, with N = 256, you needeight bits to represent the integer portion of your pointer. Interpret the low 16 bits of the accumulator tohave a decimal point seven bits up from the bottom; this leaves nine bits to store the integer part above thedecimal point. To increment the pointer by one step, add a 15-bit value to the low part of the accumulator,then zero the top bit to ensure that the value in the accumulator is greater than or equal to zero and lessthan 256.10 To use the integer part of this pointer, shift the accumulator contents seven bits to the right,add the starting address of the sine table, and store the low part into an ARx register. The auxiliary registernow points to the correct sample in the sine table.

As an example, for a nominal carrier frequency ω = π8 and sine table length N = 256, the nominal step

size is an integer p = π8N

12π = 16. Interpret the 16-bit pointer as having nine bits for the integer part,

followed by a decimal point and seven bits for the fractional part. The corresponding literal (integer) valueadded to the accumulator would be 16× 27 = 2048.11

2.1.2.3.2 Extensions

You may want to refer to Proakis [10] and Blahut [2]. These references may help you think about thefollowing questions:

• How does the noise a�ect the described carrier recovery method?• What should the phase-detector look like for a BPSK modulated carrier? (Hint: You would need to

consider both the in-phase and quadrature channels.)• How does α a�ect the bandwidth of the loop �lter?• How do the loop gain and the bandwidth of the loop �lter a�ect the PLL's ability to lock on to a

carrier frequency mismatch?

2.2 Audio E�ects

2.2.1 Audio E�ects: Using External Memory12

2.2.1.1 Introduction

Many audio e�ects require storing thousands of samples in memory on the DSP. Because there is not enoughmemory on the DSP microprocessor itself to store so many samples, external memory must be used.

In this exercise, you will use external memory to implement a long audio delay and an audio echo. Referto Core File: Accessing External Memory on TI TMS320C54x (Section 3.2.2) for a description and examplesof accessing external memory.

9Of course, nearest-neighbor interpolation could be implemented with a small amount of extra code.10How is this similar to the addition modulo 2π discussed in the MATLAB Simulation (Section 2.1.2.2: MATLAB Simulation)?11If this value were 2049, what would be the output frequency of the NCO?12This content is available online at <http://cnx.org/content/m10480/2.17/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 91: ECE 320 - Spring 2003 - cnx.org

85

2.2.1.2 Delay and Echo Implementation

You will implement three audio e�ects: a long, �xed-length delay, a variable-length delay, and a feedback-echo.

2.2.1.2.1 Fixed-length delay implementation

First, implement the 131,072-sample delay shown in Figure 2.14 using the READPROG and WRITPROG macros.Use memory locations 010000h-02ffffh in external Program RAM to do this; you may also want to use thedld and dst opcodes to store and retrieve the 32-bit addresses for the accumulators. Note that these twooperations store the words in memory in big-endian order, with the high-order word �rst.

input ch 1 output ch 1z−131072

Figure 2.14: Fixed-Length Delay

Remember that arithmetic operations that act on the accumulators, such as the add instruction, operateon the complete 32- or 40-bit value. Also keep in mind that since 131,072 is a power of two, you can usemasking (via the and instruction) to implement the circular bu�er easily. This delay will be easy to verifyon the oscilloscope. (How long, in seconds, do you expect this delay to be?)

2.2.1.2.2 Variable-delay implementation

Once you have your �xed-length delay working, make a copy and modify it so that the delay can be changedto any length between zero (or one) and 131,072 samples by changing the value stored in one double-wordpair in memory. You should keep the bu�er length equal to 131,072 and change only your addressing of thesample being read back; it is more di�cult to change the bu�er size to a length that is not a power of two.

Verify that your code works as expected by timing the delay from input to output and ensuring that itis approximately the correct length.

2.2.1.2.3 Feedback-echo implementation

Last, copy and modify your code so that the value taken from the end of the variable delay from Variable-delay implementation (Section 2.2.1.2.2: Variable-delay implementation) is multiplied by a gain factor andthen added back into the input, and the result is both saved into the delay line and sent out to the digital-to-analog converters. Figure 2.15 shows the block diagram. (It may be necessary to multiply the input bya gain as well to prevent over�ow.) This will make a one-tap feedback echo, an simple audio e�ect thatsounds remarkably good. To test the e�ect, connect the DSP EVM input to a CD player or microphone andconnect the output to a loudspeaker. Verify that the echo can be heard multiple times, and that the spacingbetween echoes matches the delay length you have chosen.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 92: ECE 320 - Spring 2003 - cnx.org

86 CHAPTER 2. PROJECT LABS

G1

G2

input ch 1 output ch 1z−n

Figure 2.15: Feedback Echo

2.2.2 Audio E�ects: Real-Time Control with the Serial Port13

2.2.2.1 Implementation

For this exercise, you will extend the system from Audio E�ects: Using External Memory (Section 2.2.1)to generate a feedback-echo e�ect. You will then extend this echo e�ect to use the serial port on the DSPEVM. The serial interface will receive data from a MATLAB GUI that allows the two system gains and theecho delay to be changed using on-screen sliders.

2.2.2.1.1 Feedback system implementation

input ch 1

G1

G2

output ch 1

output ch 2

output ch 3

output ch 4

input ch 2

z−n

Figure 2.16: Feedback System with Test Points

First, modify code from Audio E�ects: Using External Memory (Section 2.2.1) to create the feedback-echosystem shown in Figure 2.16. A one-tap feedback-echo is a simple audio e�ect that sounds remarkably good.You will use both channels of input by summing the two inputs so that either or both may be used as aninput to the system. Also, send several test signals to the six-channel board's D/A converters:

• The summed input signal• The input signal after gain stage G1

13This content is available online at <http://cnx.org/content/m10483/2.24/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 93: ECE 320 - Spring 2003 - cnx.org

87

• The data going into the long delay• The data coming out of the delay

You will also need to set both the input gain G0 and the feedback gain G1 to prevent over�ow.As you implement this code, ensure that the delay n and the gain values G1 and G2 are stored in memory

and can be easily changed using the debugger. If you do this, it will be easier to extend your code to acceptits parameters from MATLAB in MATLAB Interface Implementation (Section 2.2.2.1.2: MATLAB interfaceimplementation).

To test your echo, connect a CD player or microphone to the input of the DSP EVM, and connect theoutput of the DSP EVM to a loudspeaker. Verify that an input signal echoes multiple times in the outputand that the spacing between echoes matches the delay length you have chosen.

2.2.2.1.2 MATLAB interface implementation

After studying the MATLAB interface outlined at the end of Using the Serial Port with a MATLAB GUI14,write MATLAB code to send commands to the serial interface based on three sliders: two gain sliders (forG1 and G2) and one delay slider (for n). Then modify your code to accept those commands and changethe values for G1, G2 and n. Make sure that n can be set to values spanning the full range of 0 to 131,072,although it is not necessary that every number in that range be represented.

2.3 Surround Sound

2.3.1 Surround Sound: Passive Encoding and Decoding15

2.3.1.1 Introduction

To begin understanding how to decode the Dolby Pro Logic Surround Sound standard, you will implementa Pro Logic encoder and a passive surround sound decoder. This decoder operates on many of the sameprinciples as the more sophisticated commercial systems. Signi�cantly more technical information regardingDolby Pro Logic can be found at Gundry [5].

2.3.1.2 Encoder

You will create a MATLAB implementation of the passive encoder given by the block diagram in Figure 2.17.

HilbertTransformCenter Encoder

Right

Left Lt

Rt

SurroundDolby NR1

2

100Hz − 7kHz

BPF12

Figure 2.17: Dolby Pro Logic Encoder

The encoder block diagram shows four input signals: Left, Center, Right, and Surround. These are audiosignals created by a sound designer during movie production that are intended to play back from speakers

14"Using the Serial Port with a MATLAB GUI" <http://cnx.org/content/m12062/latest/>15This content is available online at <http://cnx.org/content/m10484/2.13/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 94: ECE 320 - Spring 2003 - cnx.org

88 CHAPTER 2. PROJECT LABS

positioned at the left side, at the front-center, at the right side, and at the rear of a home theater. Thesystem in the block diagram encodes these four channels of audio on two output channels, Lt and Rt, insuch a way that an appropriately designed decoder can approximately recover the original four channels.Additionally, to accommodate those who do not use a surround sound receiver, the encoder outputs arelistenable when played back on a stereo (two-channel) system, even retaining the correct left-right balance.

The basic components of the encoder are multipliers, adders, a Hilbert transform, a band-pass �lter, anda Dolby Noise Reduction encoder. If you wish to implement Dolby Noise Reduction, refer to Dressler [4].The other components are discussed below.

The transfer function of the Hilbert Transform is shown in Figure 2.18. The Hilbert Transform isan ideal (unrealizable) all-pass �lter with a phase shift of −90

◦. Observe that a cosine input becomes a

sine and a sine input becomes a negative cosine. In MATLAB, generate a cosine and sine signal of somefrequency and use the hilbert function to perform on each signal an approximation to the Hilbert Transform.(Why is the Hilbert Transform unrealizable?) The imaginary part of the Hilbert Transform output (i.e.,imag(hilbert(signal))) will be the −90

◦phase-shifted version of the original signal. Plot each signal to

con�rm your expectations.

j

−jπ

−π ω

H(ω)

Figure 2.18: Hilbert transform transfer function

For the band-pass �lter, design a second-order Butterworth �lter using the butter function in MATLAB.

2.3.1.2.1 Generating a surround signal

Create four channels of audio to encode as a Pro Logic Surround Signal. Use simple mixing techniques togenerate the four channels. For example, use a voice signal for the center channel and fade a roaming soundsuch as a helicopter from left to right and front to back. In MATLAB, use the wavread and auread functionsto read .wav and .au audio �les which can be found on the Internet.

2.3.1.3 Decoder

Implement the passive decoder shown in Figure 2.19 on the DSP. Use an appropriate time delay based onthe distance between the front and back speakers and the speed of sound.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 95: ECE 320 - Spring 2003 - cnx.org

89

timedelay Surround

Left

Right

Center

LPF

Rt

Lt 7kHz−

Figure 2.19: Dolby Pro Logic Passive Decoder

Is there signi�cant crosstalk between the front and surround speakers? Do you get good separationbetween left and right speakers? Can you explain how the decoder recovers approximations to the originalfour channels?

2.3.1.4 Extensions

Di�erences in power levels between channels are used to enhance the directional e�ect in what is called"active decoding." One way to �nd the power level in a signal is to square it and pass the squared signalthrough a very narrow-band low-pass �lter (f ≤ 80Hz). How is the low-frequency content of the squaredsignal related to the power of the original signal? Remember that squaring a signal in the time domain isequivalent to convolving the signal with itself in the frequency domain.

To implement a very narrow-band low-pass �lter, you may consider using the Chamberlin �lter topology,described in Surround Sound: Chamberlin Filters (Section 2.3.2).

2.3.2 Surround Sound: Chamberlin Filters16

2.3.2.1 Introduction

Chamberlin �lter topology is frequently used in music applications where very narrow-band, low-pass �ltersare necessary. Chamberlin implementations do not su�er from some stability problems that arise in direct-form implementations of very narrow-band responses. For more information about IIR/FIR �lter design forDSPs, refer to the Motorola Application Note [7].

2.3.2.2 Filter Topology

A Chamberlin �lter is a simple two-pole IIR �lter with the transfer function given in (2.10):

H (z) =Fz

2z−1

1− (2− (FcQc − Fc2)) z−1 − 1z−2(2.10)

where F (c) determines the frequency where the �lter peaks, and Qc

(1Q

)determines the rollo�. Q is de�ned

as the positive ratio of the center frequency to the bandwidth. A derivation and more detailed explanationis given in Dattorro [3]. The topology of the �lter is shown in Figure 2.20. Note that the �nal feedback stageputs a pole just inside the unit circle on the real axis. For a response with smaller bandwidth, move the pole

16This content is available online at <http://cnx.org/content/m10479/2.15/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 96: ECE 320 - Spring 2003 - cnx.org

90 CHAPTER 2. PROJECT LABS

closer to the unit circle, but do not move it so far that the �lter becomes unstable. Multiple second-ordersections can be cascaded to yield a sharper rollo�.

-z−1

0.9 z−1Qc

Fcx[n] y[n]

Fc

Figure 2.20: Chamberlin Filter Topology

Figure 2.21 and Figure 2.22 show how the response of the �lter varies with Qc and Fc.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 97: ECE 320 - Spring 2003 - cnx.org

91

0 0.5 1 1.5 2 2.5 3 3.50

5

10

15

20

25

frequency

mag

nitu

de

Qc = 0.5

Qc = 1.0

Figure 2.21: Chamberlin �lter responses for various Qc ( Fc = .3)

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 98: ECE 320 - Spring 2003 - cnx.org

92 CHAPTER 2. PROJECT LABS

0 0.5 1 1.5 2 2.5 3 3.50

5

10

15

20

25

30

35

frequency

mag

nitu

de

Fc = 0.2

Fc = 0.3

Figure 2.22: Chamberlin �lter responses for various Fc ( Qc = .8333)

2.3.2.3 Exercise

First, create a MATLAB script that takes two parameters, Qc and Fc, and plots the frequency response of a�lter with a transfer function given in (2.10). Then implement a Chamberlin �lter on the DSP and compareits performance with that of your MATLAB simulation for the same values of Qc and Fc. What do youobserve?

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 99: ECE 320 - Spring 2003 - cnx.org

93

2.4 Adaptive Filtering

2.4.1 Adaptive Filtering: LMS Algorithm17

2.4.1.1 Introduction

Figure 2.23 is a block diagram of system identi�cation using adaptive �ltering. The objective is to change(adapt) the coe�cients of an FIR �lter, W , to match as closely as possible the response of an unknownsystem, H. The unknown system and the adapting �lter process the same input signal x [n] and haveoutputs d [n](also referred to as the desired signal) and y [n].

W

He[n]

y[n]

x[n] d[n]

Figure 2.23: System identi�cation block diagram.

2.4.1.1.1 Gradient-descent adaptation

The adaptive �lter, W , is adapted using the least mean-square algorithm, which is the most widely usedadaptive �ltering algorithm. First the error signal, e [n], is computed as e [n] = d [n]− y [n], which measuresthe di�erence between the output of the adaptive �lter and the output of the unknown system. On thebasis of this measure, the adaptive �lter will change its coe�cients in an attempt to reduce the error. Thecoe�cient update relation is a function of the error signal squared and is given by

hn+1 [i] = hn [i] +µ

2

(−∂(|e|)2∂hn [i]

)(2.11)

The term inside the parentheses represents the gradient of the squared-error with respect to the ith

coe�cient. The gradient is a vector pointing in the direction of the change in �lter coe�cients that willcause the greatest increase in the error signal. Because the goal is to minimize the error, however, (2.11)updates the �lter coe�cients in the direction opposite the gradient; that is why the gradient term is negated.The constant µ is a step-size, which controls the amount of gradient information used to update eachcoe�cient. After repeatedly adjusting each coe�cient in the direction opposite to the gradient of the error,the adaptive �lter should converge; that is, the di�erence between the unknown and adaptive systems shouldget smaller and smaller.

17This content is available online at <http://cnx.org/content/m10481/2.14/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 100: ECE 320 - Spring 2003 - cnx.org

94 CHAPTER 2. PROJECT LABS

To express the gradient decent coe�cient update equation in a more usable manner, we can rewrite thederivative of the squared-error term as

∂(|e|)2∂h[i] = 2 ∂e

∂h[i] e

= 2∂(d−y)∂h[i] e

=(

2∂(d−PN−1

i=0 h[i]x[n−i])∂h[i]

)(e)

(2.12)

∂(|e|)2∂h [i]

= 2 (−x [n− i]) e (2.13)

which in turn gives us the �nal LMS coe�cient update,

hn+1 [i] = hn [i] + µex [n− i] (2.14)

The step-size µ directly a�ects how quickly the adaptive �lter will converge toward the unknown system. If µis very small, then the coe�cients change only a small amount at each update, and the �lter converges slowly.With a larger step-size, more gradient information is included in each update, and the �lter converges morequickly; however, when the step-size is too large, the coe�cients may change too quickly and the �lter willdiverge. (It is possible in some cases to determine analytically the largest value of µ ensuring convergence.)

2.4.1.2 MATLAB Simulation

Simulate the system identi�cation block diagram shown in Figure 2.23.Previously in MATLAB, you used the filter command or the conv command to implement shift-

invariant �lters. Those commands will not work here because adaptive �lters are shift-varying, since thecoe�cient update equation changes the �lter's impulse response at every sample time. Therefore, implementthe system identi�cation block on a sample-by-sample basis with a do loop, similar to the way you mightimplement a time-domain FIR �lter on a DSP. For the "unknown" system, use the fourth-order, low-pass,elliptical, IIR �lter designed for the IIR Filtering: Filter-Design Exercise in MATLAB (Section 1.3.2).

Use Gaussian random noise as your input, which can be generated in MATLAB using the commandrandn. Random white noise provides signal at all digital frequencies to train the adaptive �lter. Simulatethe system with an adaptive �lter of length 32 and a step-size of 0.02. Initialize all of the adaptive �ltercoe�cients to zero. From your simulation, plot the error (or squared-error) as it evolves over time and plotthe frequency response of the adaptive �lter coe�cients at the end of the simulation. How well does youradaptive �lter match the "unknown" �lter? How long does it take to converge?

Once your simulation is working, experiment with di�erent step-sizes and adaptive �lter lengths.

2.4.1.3 Processor Implementation

Use the same "unknown" �lter as you used in the MATLAB simulation.Although the coe�cient update equation is relatively straightforward, consider using the lms instruction

available on the TI processor, which is designed for this application and yields a very e�cient implementationof the coe�cient update equation.

To generate noise on the DSP, you can use the PN generator from the Digital Transmitter: Introductionto Quadrature Phase-Shift Keying18, but shift the PN register contents up to make the sign bit random.(If the sign bit is always zero, then the noise will not be zero-mean and this will a�ect convergence.) Sendthe desired signal, d [n], the output of the adaptive �lter, y [n], and the error to the D/A for display on theoscilloscope.

When using the step-size suggested in the MATLAB simulation section, you should notice that the errorconverges very quickly. Try an extremely small µ so that you can actually watch the amplitude of the errorsignal decrease towards zero.

18"Digital Transmitter: Introduction to Quadrature Phase-Shift Keying" <http://cnx.org/content/m10042/latest/>

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 101: ECE 320 - Spring 2003 - cnx.org

95

2.4.1.4 Extensions

If your project requires some modi�cations to the implementation here, refer to Haykin [6] and considersome of the following questions regarding such modi�cations:

• How would the system in Figure 2.23 change for di�erent applications? (noise cancellation, equalization,etc.)

• What happens to the error when the step-size is too large or too small?• How does the length of an adaptive FIR �lters a�ect convergence?• What types of coe�cient update relations are possible besides the described LMS algorithm?

2.5 Speech Processing

2.5.1 Speech Processing: Theory of LPC Analysis and Synthesis19

2.5.1.1 Introduction

Linear predictive coding (LPC) is a popular technique for speech compression and speech synthesis. Thetheoretical foundations of both are described below.

2.5.1.1.1 Correlation coe�cients

Correlation, a measure of similarity between two signals, is frequently used in the analysis of speech andother signals. The cross-correlation between two discrete-time signals x [n] and y [n] is de�ned as

rxy [l] =∞∑

n=−∞x [n] y [n− l] (2.15)

where n is the sample index, and l is the lag or time shift between the two signals Proakis and Manolakis[8] (pg. 120). Since speech signals are not stationary, we are typically interested in the similarities betweensignals only over a short time duration (30 ms). In this case, the cross-correlation is computed only over awindow of time samples and for only a few time delays l = {0, 1, . . . , P}.

Now consider the autocorrelation sequence rss [l], which describes the redundancy in the signal s [n].

rss [l] =

(l

N

N−1∑n=0

s [n] s [n− l])

(2.16)

where s [n], n = {−P,−P + 1, . . . , N − 1} are the known samples (see Figure 2.24) and the 1N is a normalizing

factor.

19This content is available online at <http://cnx.org/content/m10482/2.19/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 102: ECE 320 - Spring 2003 - cnx.org

96 CHAPTER 2. PROJECT LABS

N − 1

0

s[n]

−l N − l − 1 N − 1

multiply and accumulate to get rss[l]

s[n− l]

−P

0

Figure 2.24: Computing the autocorrelation coe�cients

Another related method of measuring the redundancy in a signal is to compute its autocovariance

rss [l] =

(1

N − 1

N−1∑n=l

s [n] s [n− l])

(2.17)

where the summation is over N − l products (the samples {s [−P] , . . . , s [−1]} are ignored).

2.5.1.1.2 Linear prediction model

Linear prediction is a good tool for analysis of speech signals. Linear prediction models the human vocaltract as an in�nite impulse response (IIR) system that produces the speech signal. For vowel soundsand other voiced regions of speech, which have a resonant structure and high degree of similarity over timeshifts that are multiples of their pitch period, this modeling produces an e�cient representation of the sound.Figure 2.25 shows how the resonant structure of a vowel could be captured by an IIR system.

1− a1z−1 − a2z

−2 − . . . aP z−P

1

Figure 2.25: Linear Prediction (IIR) Model of Speech

The linear prediction problem can be stated as �nding the coe�cients ak which result in the best predic-tion (which minimizes mean-squared prediction error) of the speech sample s [n] in terms of the past samples

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 103: ECE 320 - Spring 2003 - cnx.org

97

s [n− k], k = {1, . . . , P}. The predicted sample^s [n] is then given by Rabiner and Juang [11]

^s [n] =

P∑k=1

aks [n− k] (2.18)

where P is the number of past samples of s [n] which we wish to examine.Next we derive the frequency response of the system in terms of the prediction coe�cients ak. In (2.18),

when the predicted sample equals the actual signal (i.e.,^s [n] = s [n]), we have

s [n] =P∑k=1

aks [n− k]

s (z) =P∑k=1

aks (z) z−k

s (z) =1

1−∑Pk=1 akz

−k(2.19)

The optimal solution to this problem is Rabiner and Juang [11]

a =(a1 a2 . . . aP

)

r =(rss [1] rss [2] . . . rss [P ]

)T

R =

rss [0] rss [1] . . . rss [P − 1]

rss [1] rss [0] . . . rss [P − 2]...

......

...

rss [P − 1] rss [P − 2] . . . rss [0]

a = R−1r (2.20)

Due to the Toeplitz property of the R matrix (it is symmetric with equal diagonal elements), an e�cientalgorithm is available for computing a without the computational expense of �nding R−1. The Levinson-Durbin algorithm is an iterative method of computing the predictor coe�cients a Rabiner and Juang [11](p.115).

Initial Step: E0 = rss [0], i = 1for i = 1 to P .

Steps

1. ki = 1Ei−1

(rss [i]−∑i−1

j=1 αj,i−1rss [|i− j|])

2. • αj,i = αj,i−1 − kiαi−j,i−1 j = {1, . . . , i− 1}• αi,i = ki

3. Ei =(1− ki2

)Ei−1

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 104: ECE 320 - Spring 2003 - cnx.org

98 CHAPTER 2. PROJECT LABS

2.5.1.1.3 LPC-based synthesis

It is possible to use the prediction coe�cients to synthesize the original sound by applying δ [n], the unitimpulse, to the IIR system with lattice coe�cients ki , i = {1, . . . , P} as shown in Figure 2.26. Applyingδ [n] to consecutive IIR systems (which represent consecutive speech segments) yields a longer segment ofsynthesized speech.

In this application, lattice �lters are used rather than direct-form �lters since the lattice �lter coe�cientshave magnitude less than one and, conveniently, are available directly as a result of the Levinson-Durbinalgorithm. If a direct-form implementation is desired instead, the α coe�cients must be factored into second-order stages with very small gains to yield a more stable implementation.

D D D D

y[n]

k1

−k1

k2

−k2

k3

−k3

x[n]

Figure 2.26: IIR lattice �lter implementation.

When each segment of speech is synthesized in this manner, two problems occur. First, the synthesizedspeech is monotonous, containing no changes in pitch, because the δ [n]'s, which represent pulses of air fromthe vocal chords, occur with �xed periodicity equal to the analysis segment length; in normal speech, we varythe frequency of air pulses from our vocal chords to change pitch. Second, the states of the lattice �lter (i.e.,past samples stored in the delay boxes) are cleared at the beginning of each segment, causing discontinuityin the output.

To estimate the pitch, we look at the autocorrelation coe�cients of each segment. A large peak in theautocorrelation coe�cient at lag l 6= 0 implies the speech segment is periodic (or, more often, approximatelyperiodic) with period l. In synthesizing these segments, we recreate the periodicity by using an impulse trainas input and varying the delay between impulses according to the pitch period. If the speech segment doesnot have a large peak in the autocorrelation coe�cients, then the segment is an unvoiced signal which hasno periodicity. Unvoiced segments such as consonants are best reconstructed by using noise instead of animpulse train as input.

To reduce the discontinuity between segments, do not clear the states of the IIR model from one segmentto the next. Instead, load the new set of re�ection coe�cients, ki, and continue with the lattice �ltercomputation.

2.5.1.2 Additional Issues

• Spanish vowels (mop, ace, easy, go, but) are easier to recognize using LPC.

• Error can be computed as⇀aTR

⇀a , where R is the autocovariance or autocorrelation matrix of a test

segment and⇀a is the vector of prediction coe�cients of a template segment.

• A pre-emphasis �lter before LPC, emphasizing frequencies of interest in the recognition or synthesis,can improve performance.

• The pitch period for males (80- 150 kHz) is di�erent from the pitch period for females.

• For voiced segments, rss[T ]rss[0]

' 0.25, where T is the pitch period.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 105: ECE 320 - Spring 2003 - cnx.org

99

2.5.2 Speech Processing: LPC Exercise in MATLAB20

2.5.2.1 MATLAB Exercises

First, take a simple signal (e.g., one period of a sinusoid at some frequency) and plot its autocorrelationsequence for appropriate values of l. You may wish to use the xcorr MATLAB function to compare withyour own version of this function. At what time shift l is rss [l] maximized and why? Is there any symmetryin rss [l]? What does rss [l] look like for periodic signals?

Next, write your own version of the Levinson-Durbin algorithm in MATLAB. Note that MATLAB usesindexing from 1 rather than 0. One way to resolve this problem is to start the loop with i = 2, then shiftthe variables k, E, α, and rss to start at i = 1 and j = 1. Be careful with indices such as i− j, since thesecould still be 0.

Apply your algorithm to a 20- 30 ms segment of a speech signal. Use a microphone to record .wav audio�les on the PC using Sound Recorder or a similar application. Typically, a sample rate of 8 kHz is a goodchoice for voice signals, which are approximately bandlimited to 4 kHz. You will use these audio �les to testalgorithms in MATLAB. The functions wavread, wavwrite, sound will help you read, write and play audio�les in MATLAB:

The output of the algorithm is the prediction coe�cients ak (usually about P = 10 coe�cients is su�-cient), which represent the speech segment containing signi�cantly more samples. The LPC coe�cients arethus a compressed representation of the original speech segment, and we take advantage of this by saving ortransmitting the LPC coe�cients instead of the speech samples. Compare the coe�cients generated by yourfunction with those generated by the levinson or lpc functions available in the MATLAB toolbox. Next,plot the frequency response of the IIR model represented by the LPC coe�cients (see Speech Processing:Theory of LPC Analysis and Synthesis (2.19)). What is the fundamental frequency of the speech segment?Is there any similarity in the prediction coe�cients for di�erent 20- 30 ms segments of the same vowel sound?How could the prediction coe�cients be used for recognition?

2.5.3 Speech Processing: LPC Exercise on TI TMS320C54x21

2.5.3.1 Implementation

The sample rate on the 6-channel DSP boards is �xed at 44.1 kHz, so decimate by a factor of 5 to achievethe sample rate of 8.82 kHz, which is more appropriate for speech processing.

Compute the autocorrelation or autocovariance coe�cients of 256-sample blocks of input samples froma function generator for time shifts l = {0, 1, . . . , 15} (i.e., for P = 15) and display these on the oscilloscopewith a trigger. (You may zero out the other 240 output samples to �ll up the 256-sample block). Forcomputing the autocorrelation, you will have to use memory to record the last 15 samples of the input dueto the overlap between adjacent blocks. Compare the output on the oscilloscope with simulation results fromMATLAB.

The next step is to use a speech signal as the input to your system. Use a microphone as input to theoriginal thru6.asm22 code and adjust the gains in your system until the output uses most of the dynamicrange of the system without saturating. Now, to capture and analyze a small segment of speech, write codethat determines the start of a speech signal in the microphone input, records a few seconds of speech, andcomputes the autocorrelation or autocovariance coe�cients. The start of a speech signal can be determinedby comparing the input to some noise threshold; experiment to �nd a good value. For recording largesegments of speech, you may need to use external memory. Refer to Core File: Accessing External Memoryon TI TMS320C54x (Section 3.2.2) for more information.

Finally, incorporate your code which computes autocorrelation or autocovariance coe�cients with thecode which takes speech input and compare the results seen on the oscilloscope to those generated byMATLAB.

20This content is available online at <http://cnx.org/content/m10824/2.5/>.21This content is available online at <http://cnx.org/content/m10825/2.6/>.22http://cnx.org/content/m10825/latest/thru6.asm

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 106: ECE 320 - Spring 2003 - cnx.org

100 CHAPTER 2. PROJECT LABS

2.5.3.1.1 Integer division (optional)

In order to implement the Levinson-Durbin algorithm, you will need to use integer division to do Step 1(p. 97) of the algorithm. Refer to the Applications Guide[?] and the subc instruction for a routine thatperforms integer division.

2.6 Video Processing

2.6.1 Video Processing: Manuals23

2.6.1.1 Essential documentation for the 6000 series TI DSP

The following documentation will certainly prove useful:

• The IDK Programmer's Guide24

• The IDK User's Guide25

• The IDK Video Device Drivers User's Guide26

note: Other manuals may be found on TI's website27 by searching for TMS320C6000 IDK

2.6.2 Video Processing: Introduction to the IDK28

2.6.2.1 Introduction

The purpose of this lab is to acquaint you with the TI Image Developers Kit (IDK). The IDK containsa �oating point C6711 DSP, and other hardware that enables real time video processing. In addition tothe IDK, the video processing lab bench is equipped with an NTSC camera and a standard color computermonitor.

You will complete an introductory exercise to gain familiarity with the IDK programming environment.In the exercise, you will modify a C skeleton to horizontally �ip and invert video input from the camera. Theoutput of your video processing algorithm will appear in the top right quadrant of the monitor. In addition,you will analyze existing C code that implements �ltering and edge detection algorithms to gain insight intoIDK programming methods. The output of these "canned" algorithms, along with the unprocessed input,appears in the other quadrants of the monitor.

An additional goal of this lab is to give you the opportunity to discover tools for developing an originalproject using the IDK.

2.6.2.2 Video Processing Setup

The camera on the video processing lab bench generates an analog video signal in NTSC format. NTSC isa standard for transmitting and displaying video that is used in television. The signal from the camera isconnected to the "composite input" on the IDK board (the yellow plug). This is illustrated in Figure 2-1 onpage 2-3 of the IDK User's Guide[?]. Notice that the IDK board is actually two boards stacked on top ofeach other. The bottom board contains the C6711 DSP, where your image processing algorithms will run.The top board is the daughterboard, which contains hardware for interfacing with the camera input andmonitor output. For future video processing projects, you may connect a video input other than the camera,such as the output from a DVD player. The output signal from the IDK is in RGB format, so that it maybe displayed on a computer monitor.

23This content is available online at <http://cnx.org/content/m10889/2.5/>.24http://www-s.ti.com/sc/psheets/spru495a/spru495a.pdf25http://www-s.ti.com/sc/psheets/spru494a/spru494a.pdf26http://www-s.ti.com/sc/psheets/spru499/spru499.pdf27http://www.ti.com28This content is available online at <http://cnx.org/content/m10926/2.7/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 107: ECE 320 - Spring 2003 - cnx.org

101

At this point, a description of the essential terminology of the IDK environment is in order. The videoinput is �rst decoded and then sent to the FPGA, which resides on the daughterboard. The FPGA isresponsible for the �lling of the frame bu�er and video capture. For a detailed description the FPGA andits functionality, we advise you to read Chapter 2 of the IDK User's Guide[?].

The Chip Support Library (CSL) is an abstraction layer that allows the IDK daughterboard to beused with the entire family of TI C6000 DSPs (not just the C6711 that we're using); it takes care of whatis di�erent from chip to chip.

The Image Data Manager (IDM) is a set of routines responsible for moving data between on chipinternal memory and external memory on the board during processing. The IDM helps the programmerby taking care of the pointer updates and bu�er management involved in transferring data. Your DSPalgorithms will read and write to internal memory, and the IDM will transfer this data to and from externalmemory. Examples of external memory include temporary "scratch pad" bu�ers, the input bu�er containingdata from the camera, and the output bu�er with data destined for the RGB output.

The TI C6711 DSP uses a di�erent instruction set than the 5400 DSP's you are familiar with in lab.The IDK environment was designed with high level programming in mind, so that programmers would beisolated from the intricacies of assembly programming. Therefore, we strongly suggest that you do all yourprogramming in C. Programs on the IDK typically consist of a main program that calls an image processingroutine. The image processing routine may make several calls to specialized functions. These specializedfunctions consist of an outer wrapper and an inner component. The component performs processing on oneline of an image. The wrapper oversees of the processing of the entire image, using the IDM to move databack and forth between internal memory and external memory. In this lab, you will modify a component toimplement the �ipping and inverting algorithm.

In addition, the version of Code Composer that the IDK uses is di�erent from the one you have usedpreviously. The IDK uses Code Composer Studio v2.1. It is similar to the other version, but the process ofloading code is slightly di�erent.

2.6.2.3 Code Overview

The program �ow for these image processing applications may be a bit di�erent from your previous expe-riences in C programming. In most C programs, the main function is where program execution starts andends. In this real-time application, the main function serves only to setup initializations for the cache, theCSL, and the DMA channel. When it exits, the main task, tskMainFunc(), will execute automatically,starting the DSP/BIOS. This is where our image processing application begins.

The tskMainFunc(), in main.c, opens the handles to the board for image capture (VCAP_open()) andto the display (VCAP_open()) and calls the grayscale function. Here, several data structures are instantiatedthat are de�ned in the �le img_proc.h. The IMAGE structures will point to the data that is captured bythe FPGA and the data that will be output to the display. The SCRATCH_PAD structure points to ourinternal and external memory bu�ers used for temporary storage during processing. LPF_PARAMS is usedto store �lter coe�cients for the low pass �lter.

The call to img_proc() takes us to the �le img_proc.c. First, several variables are declared and de�ned.The variable quadrant will denote on which quadrant of the screen we currently want output; out_ptr willpoint to the current output spot in the output image; and pitch refers to the byte o�set between two lines.This function is the high level control for our image-processing algorithm. See algorithm �ow (Figure 2.27).

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 108: ECE 320 - Spring 2003 - cnx.org

102 CHAPTER 2. PROJECT LABS

Figure 2.27: Algorithm �ow.

The �rst function called is the pre_scale_image function in the �le pre_scale_image.c. The purposeof this function is to take the 640x480 image and scale it down to a quarter of its size by �rst downsamplingthe input rows by two and then averaging every two pixels horizontally. The internal and external memoryspaces in the scratch pad are used for this task. The vertical downsampling will occur when only every otherline is read into the internal memory from the input image. Within internal memory, we will operate on twolines of data (640 columns/line) at a time, averaging every two pixels (horizontal neighbors) and producingtwo lines of output (320 columns/line) that are stored in the external memory.

To accomplish this, we will need to take advantage of the IDM by initializing the input and outputstreams. At the start of the function, two instantiations of a new structure dstr_t are declared. You canview the structure contents of dstr_t on p. 2-11 of the IDK Programmer's Guide[?]. The structure contentsare de�ned with calls to dstr_open(). This data �ow for the pre-scale is shown in data �ow (Figure 2.28).

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 109: ECE 320 - Spring 2003 - cnx.org

103

Figure 2.28: Data �ow of input and output streams.

To give you a better understanding of how these streams are created, let's analyze the parameters passedin the �rst call to dstr_open():

2.6.2.3.1 External address: in_image->data

This is a pointer to the place in memory serving as the source of our input data (it's the source because thelast function parameter is set to DSTR_INPUT).

2.6.2.3.2 External size: (rows + num_lines) * cols = (240 + 2) * 640

This is the total size of our input data. We will only be taking every other line from in_image->data, soonly 240 rows. The extra two rows are for bu�er.

2.6.2.3.3 Internal address: int_mem

This is a pointer to an 8x640 lexographic array, speci�cally scratchpad->int_data. This is where we willbe putting the data on each call to dstr_get().

2.6.2.3.4 Internal size: 2 * num_lines * cols = 2 * 2 * 640

The size of space available for data to be input into int_mem from in_image->data. Because doublebu�ering is used, num_lines is set to 2.

2.6.2.3.5 Number of bytes/line: cols = 640, Number of lines: num_lines = 2

Each time dstr_get() is called, it will return a pointer to 2 lines of data, 640 bytes in length.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 110: ECE 320 - Spring 2003 - cnx.org

104 CHAPTER 2. PROJECT LABS

2.6.2.3.6 External memory increment/line: stride*cols = 1*640

Left as an exercise.

2.6.2.3.7 Window size: 1 for double bu�ered

The need for the window size is not really apparent here. It will become apparent when we do the 3x3 blockconvolution. Then, the window size will be set to 3. This tells the IDM to send a pointer to 3 lines of datawhen dstr_get() is called, but only increment the stream's internal pointer by 1 (instead of 3) the nexttime dstr_get() is called. This is not a parameter when setting up an output stream.

2.6.2.3.8 Direction of input: DSTR_INPUT

Sets the direction of data �ow. If it had been set to DSTR_OUTPUT (as done in the next call to dstr_open()),we would be setting the data to �ow from the Internal Address to the External Address.

Once our data streams are setup, we can begin processing by calling the component function pre_scale()

(in pre_scale.c) to operate on one block of data at a time. This function will perform the horizontalscaling by averaging every two pixels. This algorithm operates on four pixels at a time. The entire functionis iterated within pre_scale_image() 120 times, which is the number of rows in each quadrant. Beforepre_scale_image() exits, the data streams are closed, and one line is added to the top and bottom of theimage to provide context necessary for the next processing steps. Now that the input image has been scaledto a quarter of its initial size, we will proceed with the four image processing algorithms. In img_proc.c,the set_ptr() function is called to set the variable out_ptr to point to the correct quadrant on the 640x480output image. Then copy_image(), copy_image.c, is called, performing a direct copy of the scaled inputimage into the lower right quadrant of the output.

Next we will set the out_ptr to point to the upper right quadrant of the output image and callconv3x3_image() in conv3x3_image.c. As with pre_scale_image(), the _image indicates this is onlythe wrapper function for the ImageLIB component, conv3x3(). As before, we must setup our input andoutput streams. This time, however, data will be read from the external memory, into internal memory forprocessing, and then written to the output image. Iterating over each row, we compute one line of data bycalling the component function conv3x3() in conv3x3.c.

In conv3x3(), you will see that we perform a 3x3 block convolution, computing one line of data with thelow pass �lter mask. Note here that the variables IN1[i], IN2[i], and IN3[i] all grab only one pixel at atime. This is in contrast to the operation of pre_scale() where the variable in_ptr[i] grabbed 4 pixels at atime. This is because in_ptr was of type unsigned int, which implies that it points to four bytes of data ata time. IN1, IN2, and IN3 are all of type unsigned char, which implies they point to a single byte of data.In block convolution, we are computing the value of one pixel by placing weights on a 3x3 block of pixels inthe input image and computing the sum. What happens when we are trying to compute the rightmost pixelin a row? The computation is now bogus. That is why the wrapper function copies the last good columnof data into the two rightmost columns. You should also note that the component function ensures outputpixels will lie between 0 and 255.

Back in img_proc.c, we can begin the edge detection algorithm, sobel_image(), for the lower leftquadrant of the output image. This wrapper function, located in sobel_image.c, performs edge detectionby utilizing the assembly written component function sobel() in sobel.asm. The wrapper function is verysimilar to the others you have seen and should be straightforward to understand. Understanding the assembly�le is considerably more di�cult since you are not familiar with the assembly language for the c6711 DSP.As you'll see in the assembly �le, the comments are very helpful since an "equivalent" C program is giventhere.

The Sobel algorithm convolves two masks with a 3x3 block of data and sums the results to producea single pixel of output. This algorithm approximates a 3x3 nonlinear edge enhancement operator. Thebrightest edges in the result represent a rapid transition (well-de�ned features), and darker edges representsmoother transitions (blurred or blended features).

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 111: ECE 320 - Spring 2003 - cnx.org

105

2.6.2.4 Using the IDK Environment

This section provides a hands-on introduction to the IDK environment that will prepare you for the labexercise. First, connect the power supply to the IDK module. Two green lights on the IDK board should beilluminated when the power is connected properly.

You will need to create a directory img_proc for this project in your home directory. Enter this newdirectory, and then copy the following �les as follows (again, be sure you're in the directory img_proc whenyou do this):

copy V:\ece320\idk\c6000\IDK\Examples\NTSC\img_proccopy V:\ece320\idk\c6000\IDK\Drivers\includecopy V:\ece320\idk\c6000\IDK\Drivers\lib

After the IDK is powered on, open Code Composer 2 by clicking on the "CCS 2" icon on the desktop. Fromthe "Project" menu, select "Open," and then open img_proc.pjt. You should see a new icon appear at themenu on the left side of the Code Composer window with the label img_proc.pjt. Double click on this iconto see a list of folders. There should be a folder labeled "Source." Open this folder to see a list of program�les.

The main.c program calls the img_proc.c function that displays the output of four image processingroutines in four quadrants on the monitor. The other �les are associated with the four image processingroutines. If you open the "Include" folder, you will see a list of header �les. To inspect the main program,double click on the main.c icon. A window with the C code will appear to the right.

Scroll down to the tskMainFunc() in the main.c code. A few lines into this function, you will see theline LOG_printf(&trace,"Hello\n");. This line prints a message to the message log, which can be usefulfor debugging. Change the message "Hello\n" to "Your Name\n" (the "\n" is a carriage return). Save the�le by clicking the little �oppy disk icon at the top left corner of the Code Composer window.

To compile all of the �les when the ".out" �le has not yet been generated, you need to use the "RebuildAll" command. The rebuild all command is accomplished by clicking the button displaying three little redarrows pointing down on a rectangular box. This will compile every �le the main.c program uses. If you'veonly changed one �le, you only need to do a "Incremental Build," which is accomplished by clicking on thebutton with two little blue arrows pointing into a box (immediately to the left of the "Rebuild All" button).Click the "Rebuild All" button to compile all of the code. A window at the bottom of Code Composer willtell you the status of the compiling (i.e., whether there were any errors or warnings). You might notice somewarnings after compilation - don't worry about these.

Click on the "DSP/BIOS" menu, and select "Message Log." A new window should appear at the bottomof Code Composer. Assuming the code has compiled correctly, select "File" -> "Load Program" and loadimg_proc.out (the same procedure as on the other version of Code Composer). Now select "Debug" ->"Run" to run the program (if you have problems, you may need to select "Debug" -> "Go Main" beforerunning). You should see image processing routines running on the four quadrants of the monitor. The upperleft quadrant (quadrant 0) displays a low pass �ltered version of the input. The low pass �lter "passes" thedetail in the image, and attenuates the smooth features, resulting in a "grainy" image. The operation ofthe low pass �lter code, and how data is moved to and from the �ltering routine, was described in detailin the previous section. The lower left quadrant (quadrant 2) displays the output of an edge detectionalgorithm. The top right and bottom right quadrants (quadrants 1 and 3, respectively), show the originalinput displayed unprocessed. At this point, you should notice your name displayed in the message log.

2.6.2.5 Implementation

You will create the component code flip_invert.c to implement an algorithm that horizontally �ips andinverts the input image. The code in flip_invert.c will operate on one line of the image at a time. The

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 112: ECE 320 - Spring 2003 - cnx.org

106 CHAPTER 2. PROJECT LABS

copyim.c wrapper will call flip_invert.c once for each row of the prescaled input image. The flip_invertfunction call should appear as follows:

flip_invert(in_data, out_data, cols);

where in_data and out_data are pointers to the input and output bu�ers in internal memory, and cols isthe length of each column of the prescaled image.

The img_proc.c function should call the copyim.c wrapper so that the �ipped and inverted imageappears in the top right (�rst) quadrant. The call to copyim is as follows: copyim(scratch_pad, out_img,

out_ptr, pitch);

This call is commented out in the im_proc.c code. The algorithm that copies the image (unprocessed)to the screen is currently displayed in quadrant 1, so you will need to comment out its call and replace itwith the call to copyim.

Your algorithm should �ip the input picture horizontally, such that someone on the left side of the screenlooking left in quadrant 3 will appear on the right side of the screen looking right. This is similar to puttinga slide in a slide projector backwards. The algorithm should also invert the picture, so that something whiteappears black and vice versa. The inversion portion of the algorithm is like looking at the negative for ablack and white picture. Thus, the total e�ect of your algorithm will be that of looking at the wrong side ofthe negative of a picture.

note: Pixel values are represented as integers between 0 and 255.

To create a new component �le, write your code in a �le called "flip_invert.c". You may �nd thecomponent code for the low pass �lter in "conv3x3_c.c" helpful in giving you an idea of how to get started.To compile this code, you must include it in the "img_proc" project, so that it appears as an icon in CodeComposer. To include your new �le, right click on the "img_proc.pjt" icon in the left window of CodeComposer, and select "Add Files."

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 113: ECE 320 - Spring 2003 - cnx.org

Chapter 3

General References

3.1 Processor

3.1.1 Two's Complement and Fractional Arithmetic for 16-bit Processors1

3.1.1.1 Two's-complement notation

Two's-complement notation is an e�cient way of representing signed numbers in microprocessors. Ito�ers the advantage that addition and subtraction can be done with ordinary unsigned operations. Whena number is written in two's complement notation, the most signi�cant bit of the number represents itssign: 0 means that the number is positive, and 1 means the number is negative. A positive number writtenin two's-complement notation is the same as the number written in unsigned notation (although the mostsigni�cant bit must be zero). A negative number can be written in two's complement notation by invertingall of the bits of its absolute value, then adding one to the result.

Example 3.1Consider the following four-bit two's complement numbers (in binary form):

1 = 0001 −1 = 1110 + 1 = 1111

2 = 0010 −2 = 1101 + 1 = 1110

6 = 0110 −6 = 1001 + 1 = 1010

8 = 1000 −8 = 0111 + 1 = 1000

Table 3.1

note: 1000 represents -8, not 8. This is because the topmost bit (the sign bit) is 1, indicatingthat the number is negative.

The maximum number that can be represented with a k-bit two's-complement notation is 2k−1 − 1, andthe minimum number that can be represented is −2k−1. The maximum integer that can be represented ina 16-bit memory register is 32767, and the minimum integer is -32768.

1This content is available online at <http://cnx.org/content/m10808/2.9/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

107

Page 114: ECE 320 - Spring 2003 - cnx.org

108 CHAPTER 3. GENERAL REFERENCES

3.1.1.2 Fractional arithmetic

The DSP microprocessor is a 16-bit integer processor with some extra support for fractional arithmetic.Fractional arithmetic turns out to be very useful for DSP programming, since it frees us from worries aboutover�ow on multiplies. (Two 16-bit numbers, multiplied together, can require 32 bits for the result. Two 16-bit �xed-point fractional numbers also require 32 bits for the result, but the 32-bit result can be rounded into16 bits while only introducing an error of approximately 2−16.) For this reason, we will be using �xed-pointfractional representation to describe �lter taps and inputs throughout this course.

Unfortunately, the assembler and debugger we are using do not recognize this fractional �xed-pointrepresentation. For this reason, when you are using the assembler or debugger, you will see decimal values(ranging from -32768 to 32767) on screen instead of the fraction being represented. The conversion is simple;the fractional number being represented is simply the decimal value shown divided by 32768. This allows usto represent numbers between -1 and 1− 2−15.

note: 1 cannot be represented exactly.

When we multiply using this representation, an extra shift left is required. Consider the two examplesbelow:

Example 3.2

fractional 0.5× 0.5 = 0.25

decimal 16384× 16384 = 4096× 216 : 4096/327681/8

hex 4000× 4000 = 1000× 216

Table 3.2

fractional 0.125× 0.75 = 0.093750

decimal 4096× 24576 = 1536× 216 : 1536/327680.046875

hex 1000× 6000 = 0600× 216

Table 3.3

You may wish touse the MATLAB commands hex2dec and dec2hex. When we do the multiplication,we are primarily interested in the top 16 bits of the result, since these are the data that are actually usedwhen we store the result back into memory and send it out to the digital-to-analog converter. (The entireresult is actually stored in the accumulator, so rounding errors do not accumulate when we do a sequence ofmultiply-accumulate operations in the accumulators.) As the example above shows, the top 16 bits of theresult of multiplying the �xed point fractional numbers together is half the expected fractional result. Theextra left shift multiplies the result by two, giving us the correct �nal product.

The left-shift requirement can alternatively be explained by way of decimal place alignment. Rememberthat when we multiply decimal numbers, we �rst multiply them ignoring the decimal points, then put thedecimal point back in the last step. The decimal point is placed so that the total number of digits right ofthe decimal point in the multiplier and multiplicand is equal to the number of digits right of the decimalpoint in their product. The same applies here; the "decimal point" is to the right of the leftmost (sign) bit,and there are 15 bits (digits) to the right of this point. So there are a total of 30 bits to the right of thedecimal in the source. But if we do not shift the result, there are 31 bits to the right of the decimal in the32-bit result. So we shift the number to the left by one bit, which e�ectively reduces the number of bitsright of the decimal to 30.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 115: ECE 320 - Spring 2003 - cnx.org

109

Before the numbers are multiplied by the ALU, each term is sign-extended generating a 17-bit numberfrom the 16-bit input. Because the examples presented above are all positive, the e�ect of this sign extensionis simply adding an extra "0" bit at the top of the register (i.e., positive numbers are not a�ected by the signextension). As the following example illustrates, not including this sign-bit for negative numbers produceserroneous results.

fractional −0.5× 0.5 = −0.25

decimal 49152× 16384 = 12288× 216 : 12288/326780.375

hex C000× 4000 = 30000000 = 3000× 216

Table 3.4

Note that even after the result is left-shifted by one bit following the multiply, the top bit of the resultis still "0", implying that the result is incorrectly interpreted as a positive number.

To correct this problem, the ALU sign-extends negative multipliers and multiplicands by placing a "1"instead of a "0" in the added bit. This is called sign extension because the sign bit is "extended" to theleft another place, adding an extra bit to the left of the number without changing the number's value.

fractional −0.5× 0.5 = −0.25

hex 1C000× 4000 = 70000000 = 7000× 216

Table 3.5

Although the top bit of this result is still "0", after the �nal 1-bit left-shift the result is E000 000h whichis a negative number (the top bit is "1"). To check the �nal answer, we can negate the product using thetwo's complement method described above. After �ipping all of the bits we have 1FFF FFFFh, and addingone yields 2000 0000h, which equals 0.25 when interpreted as an 32 bit fractional number.

3.1.2 Addressing Modes for TI TMS320C54x2

Microprocessors provide a number of ways to specify the location of data to be used in calculations. Forexample, one of the data values to be used in an add instruction may be encoded as part of that instruction'sopcode, the raw machine language produced by the assembler as it parses your assembly language program.This is known as immediate addressing. Alternatively, perhaps the opcode will instead contain a memoryaddress which holds the data (direct addressing). More commonly, the instruction will specify that anauxiliary register holds the memory address which in turn holds the data (indirect addressing). Theprocessor knows which addressing mode is being used by examining special bit �elds in the instructionopcode.

Knowing the basic addressing modes of your microprocessor is important because they map directly intoassembly language syntax. Many annoying and sometimes hard-to-�nd bugs are caused by inadvertentlyusing the wrong addressing mode in an instruction. Also, in any assembly language, the need to use aparticular addressing mode often dictates which instruction one picks for a given task.

Chapter �ve, Data Addressing, in the CPU and Peripherals[?] reference contains extended descriptionsof most of the addressing modes described below.

2This content is available online at <http://cnx.org/content/m10806/2.7/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 116: ECE 320 - Spring 2003 - cnx.org

110 CHAPTER 3. GENERAL REFERENCES

3.1.2.1 Accumulators: src, dst

Whenever the abbreviations src or dst are used in the assembly language syntax description for an instruc-tion, it means that only the accumulators A and B may be used for that particular operand. These are seeneverywhere, but two classic examples are ld, which always loads data into an accumulator from somewhereelse, and sth/stl, which always store data from an accumulator to somewhere else.

Examples:

ld *AR5,A ; sets A = (contents of memory location pointed to by AR5)

sth B,*AR7+ ; sets (contents of memory location pointed to be AR7) = B,

; and then increments AR7 by one

3.1.2.2 Memory-mapped Registers: MMR, MMRx, MMRy

Many of the TMS320C54x registers are memory-mapped, meaning that they occupy real addresses at thelow end of data memory space. The most commonly used of these are the auxiliary registers AR0 throughAR7. Whenever the abbreviation MMR is used in the assembly language syntax description for an instruction,it means that any memory-mapped register may be used for that particular operand. Only eight instructionsuse memory-mapped register addressing: ldm, mvdm, mvmd, mvmm, popm, pshm, stlm, and stm. With mvmm,since the instruction accepts two memory-mapped register operands, MMRx and MMRy, only AR0-AR7 and SP

may be used.Do not use an asterisk in front of ARx variables here, since this is not indirect addressing.Examples:

mvmm AR3,AR5 ; sets AR5 = AR3

stm #5,AR2 ; sets AR2 = 5

ldm AR0,A ; sets A = AR0

3.1.2.3 Immediate Addressing: #k3, #k5, K, #k9, #lk

Immediate addressing means that the numerical value of the data is itself provided within the assemblyinstruction. Various TMS320C54x instructions allow immediate data of 3, 5, 8, 9, or 16 bits in length, whichare signi�ed in the assembly language syntax descriptions with one of the above symbols. The 16-bit form isthe most common and is signi�ed by #lk. 16-bit immediate values always require an extra instruction wordand therefore take an extra machine cycle to execute.

An immediate data operand is almost always speci�ed in assembler syntax by prepending a pound sign(#) to the data. Depending on the context, the assembler may assume that you meant immediate addressinganyway.

Examples:

ld #0,A ; sets A = 0

cmpm AR1,#1 ; sets flag TC = 1 if AR1 == 1; else TC = 0

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 117: ECE 320 - Spring 2003 - cnx.org

111

Labels make this more complicated. Recall that a label in your assembly code is nothing more than shorthandfor the memory address where the labeled code or data is stored. So does an instruction like

stm coef,AR2 ; sets AR2 = memory address of label coef

mean to store the contents of memory location coef in AR2, or does it mean to store the memory addresscoef itself in AR2? The second interpretation is correct. Because the stm instruction has only one form,expecting a #lk immediate operand, the assembler does not care whether the label is pre�xed with a poundsign or not. Still, it would have been better for us to include the pound sign in the above example for clarity.

Many instructions have several versions allowing the use of di�erent addressing modes (see ld for agood example of this). With these instructions, including the pound sign is not optional when specifyingimmediate addressing. The only safe rule, then, is always to pre�x the label with a pound sign if you wishto specify the memory address of the label and not the contents of that address.

If you are not sure how a particular instruction has been assembled, you can always examine the .lst

�le produced by the assembler, and compare the hexadecimal opcodes listed to the left of the assemblyinstructions with the assembly opcodes given in the assembly language manual (Chapter 4 of the MnemonicInstruction Set[?] reference).

3.1.2.4 Direct Addressing: Smem and others

In the modes called direct addressing by TI, the instruction opcode contains a memory o�set (see the"dma" bits on page 5-8 of the CPU and Peripherals[?] reference) seven bits long, which is combined witheither the DP (data pointer) or SP (stack pointer) register to obtain a complete 16-bit data-memory address.This divides the data memory into pages of 128 words each.

SP is initialized for you in the core �le and should not need to be modi�ed. SP-referenced direct addressingis used by the pshd, pshm, popd, and popm instructions for stack manipulation, as well as by all subroutinecalls and returns, which save program addresses on the stack.

DP-referenced direct addressing is available wherever you see the Smem abbreviation in an assembly syntaxdescription. The advantage of DP-referenced addressing over the *(lk) form described in the next sectionis that DP-referenced addressing will not add an extra instruction word (and corresponding extra machinecycle). The disadvantage is that it is limited to 128 words of contiguous memory, and you have to make surethat DP points to the right 128 words. DP may be changed with the ld instruction as needed.

Examples:

ld 10,A ; sets A = (contents of memory location DP + 10)

add 6,B ; sets B = B + (contents of memory location DP + 6)

note: Make sure you understand that the numbers 10 and 6 above are interpreted as memoryaddresses, not data values. To get data values, you would need to use a pound sign in front of thenumbers.

3.1.2.5 Absolute Addressing: dmad, pmad, *(lk)/Smem

This seems to be TI's term for all the forms of direct addressing which it does not call direct addressing!It is represented in assembly-instruction syntax-de�nitions using one of the above abbreviations (*(lk)addressing is available when the syntax de�nition says Smem).

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 118: ECE 320 - Spring 2003 - cnx.org

112 CHAPTER 3. GENERAL REFERENCES

3.1.2.5.1 dmad

dmad (Data Memory ADdress) operands are used by mvxx data move instructions and represent 16-bitmemory addresses in data memory whose contents are used in the instruction.

Example:

f3ptr .word 0 ; reserve one word of storage; initialize to 0

. . . .

mvdm f3ptr,AR4 ; set AR4 = memory address of f3ptr

3.1.2.5.2 pmad

pmad (ProgramMemory ADdress) operands are used by the firs, macd, macp, mvdp, and mvpd instructions, aswell as all subroutine calls and branching instructions. They represent 16-bit addresses in program memorywhose contents are used in the instruction, or jumped to in the case of branch instructions. Other thansubroutine calls and branches, the most common use of a pmad is for the firs instruction.

Example:

firs *AR3+,*AR4+,coefs

note: coefs is a label in the program section of the code, not the data section.

3.1.2.5.3 *(lk)

*(lk) addressing is a syntactic oddity. The asterisk symbol generally means that indirect addressing is beingused (see below), but this is actually direct addressing with a 16-bit data memory address encoded in theinstruction's last word. The reason for the asterisk is that TI does set the "I" bit in the opcode, usuallydenoting indirect addressing, and this form can only be used when an Smem is called for in the assemblysyntax. Other bits in the low byte of the �rst instruction word tell the processor that the "*(lk) exception"is to be used, and to fetch the memory address in the next word (see the MOD bits on page 5-10 of the CPUand Peripherals[?] reference). You can easily recognize this addressing mode in .lst �les because the lowbyte of the �rst instruction word always equals F8h.

Examples:

hold .word 1 ; reserve one word of storage and initialize to 1

count .word 0 ; reserve one word of storage and initialize to 0

. . . .

ld *(count),B ; sets B = 0 (assuming memory was not changed)

st T,*(hold) ; sets (storage location at address hold) = T

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 119: ECE 320 - Spring 2003 - cnx.org

113

3.1.2.6 Indirect Addressing: Smem, Xmem, Ymem

Indirect addressing on the TMS320C54x always uses the auxiliary registers AR0 through AR7 and comesin two basic �avors. These are easily recognized from the assembly language syntax descriptions as eitherSmem or Xmem/Ymem.

3.1.2.6.1 Smem

In Smem indirect addressing, only one indirect address is used in the instruction and a number of variationsis possible (see the table on page 5-13 of the CPU and Peripherals[?] reference). An asterisk is always used,which signi�es indirect addressing. Any of the registers AR0-AR7 may be used, with optional modi�cations:automatic post-decrement by one, pre- and post-increment by one, post-increment and post-decrement byn (n being stored in AR0), and more, including many options for circular addressing (which automaticallyimplements circular bu�ers) and bit-reversed addressing (which is useful for FFTs).

3.1.2.6.2 Xmem/Ymem

Xmem/Ymem indirect addressing is generally used in instructions that need two di�erent indirect addresses,although there are a few instances where an Xmem by itself is speci�ed in order to save bits in the opcodefor other options. In Xmem/Ymem indirect addressing, fewer bits are used to encode the option modi�ers inthe opcode; hence, fewer options are available: post-increment by one, post-decrement by one, and post-increment by AR0 with circular addressing.

Examples:

stl B,*AR6 ; sets (contents of location pointed to by AR6) = low word of B

stl B,*AR6+0% ; sets (contents of location pointed to by AR6) = low word of B,

; then increments AR6 with circular addressing

mar *+AR3(-6) ; decrements AR3 by 6 (increment by -6)

note: The mar (modify address register) instruction is unusual in the sense that it takes anSmem operand but does nothing with the data pointed to by the ARx register. Its purpose is toperform any of the allowed register modi�cations discussed above without having to do anythingelse. This is often handy when you are using an Xmem/Ymem-type instruction but need to do an ARx

modi�cation that is only allowed with an Smem-type operand.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 120: ECE 320 - Spring 2003 - cnx.org

114 CHAPTER 3. GENERAL REFERENCES

3.1.2.7 Summary

The ld instruction is illustrative of the many possible addressing modes which can be selected with theproper choice of assembly language syntax:

ld #0,A ; immediate data: sets A = 0

ld 0,A ; DP-referenced direct: sets A = (contents of the address DP + 0)

ld mydata,A ; DP-referenced direct: sets A = (contents of the address

; DP + lower seven bits of mydata)

ld #mydata,A ; immediate data: sets A = 16 bit address mydata

ld *(mydata),A ; *(lk) direct: sets A = (contents of the 16 bit address mydata)

ld B,A ; accumulator: sets A = B

ld *AR1+,A ; indirect: sets A = (contents of address pointed to by AR1),

; and afterwards increments AR1 by one

ldm AR2,A ; memory-mapped register: sets A = AR2

3.2 Core File

3.2.1 Core File: Introduction to Six-Channel Board for TI EVM320C543

3.2.1.1 The Six Channel Surround Sound Board

The six-channel board attaches to the top of the DSP evaluation module and replaces its onboard, one-channel A/D and D/A with six channels of D/A and two channels of A/D running at a sample rate of 44.1kHz. Alternatively, the A/D can be disabled and a SP/DIF digital input enabled, allowing PCM digital datafrom a CD or DVD to be sent directly into the DSP for processing. The two input channels and six outputchannels all sample at the same time; clock skew between channels is not an issue. By default, the core codereads and writes blocks of 64 samples for each channel of input and output; however, this aggregation canbe changed to any value between 1 and 80 samples4. If your code needs a larger aggregation of samples - forinstance, for a 256 point FFT - you will need to do this aggregation yourself.

Other features include bu�ered serial communication code, which allows you to send and receive datafrom the serial port. This can be used to control your DSP program with a graphical user-interface on thePC; it can also be used to report status back to the PC for applications such as speech recognition.

The core code, core.asm5 (which requires globals.inc6 , ioregs.inc7 , and misc.inc8 ) also initializes theDSP itself. It enables the fractional arithmetic mode for the ALU, programs the wait states for the externalmemory, and sets the DSP clock to 80 MHz9.

3.2.1.1.1 Testing the six-channel sample code

We will start with a sample application, which simply sends the inputs into the outputs�relaying both theaudio inputs from the A/D converters to the D/A converters, and any data that comes in on the serial

3This content is available online at <http://cnx.org/content/m10513/2.13/>.4The upper bound is determined by the amount of memory available to the auto-bu�ering unit.5http://cnx.rice.edu/author/workgroups/90/m10017/core.asm6http://cnx.org/content/m10513/latest/globals.inc7http://cnx.org/content/m10513/latest/ioregs.inc8http://cnx.org/content/m10513/latest/misc.inc9The DSP is rated to run at 100 MHz; however, the serial port does not work reliably when the DSP clock speed is greater

than 80 MHz.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 121: ECE 320 - Spring 2003 - cnx.org

115

port back to the PC. To familiarize yourself with this sample application, locate a copy of thru6.asm10 andassemble it.

Once you have done that, start Code Composer. Since we are using the on-chip RAM on the TMS320C549to hold program code, we need to map that RAM into program space before we can load our code. This canbe done by opening the CPU Registers window (the same one you use to look at the ARx registers and theaccumulators) and changing the PMST register to 0xFFE0. This sets the OVLY bit to 1, switching the internalRAM into the DSP's program memory space.

Finally, load the thru6.out �le, use Code Composer's Reset DSP menu option to reset the DSP, andrun the code. Probe at the connections with the function generator and the oscilloscope; inputs and outputsare shown in Figure 3.1. Note that output channels 1-3 come from input channel 1, and output channels 4-6come from input channel 2. Figure 3.1 shows the six-channel board's connector con�guration.

Figure 3.1: Six-Channel Board Analog Inputs and Outputs

Also test the serial communications portion of the thru6.asm11 application. This can be done by starting aprovided terminal emulator package (such as Teraterm Pro or HyperTerminal), con�guring it to communicateat 38400 bps, with no parity, eight data bits, and one stop bit, and attaching the correct serial port on thecomputer to the TI TMS320C54x EVM. A serial port is a 9-pin D-shell connector; it is located on theDSP EVM next to the power connector. Typically, there will be two matching D-shell connectors on yourcomputer, often labeled COM1 and COM2; make sure you connect your serial cable to the right one!

Once you have started the terminal emulator, and the emulator has been correctly set to communicatewith the DSP board, reload and rerun the thru6.asm12 application. Once it is running, you should be ableto communicate with the DSP by typing text into the terminal emulator's window. Characters that youtype should be echoed back; this indicates that the DSP has received and retransmitted the characters. Ifthe DSP is not connected properly or not running, nothing will be displayed as you type. If this happens,check the connections and the terminal emulator con�guration, and try again. Due to a terminal emulationquirk, the "Enter" key does not work properly.

After you have veri�ed that the EVM is communicating with the PC, close the terminal window.

3.2.1.2 Memory Maps and the Linker

Because the DSP has separate program and data spaces, you would expect for the program and data memoryto be independent. However, for the DSP to run at its maximum e�ciency, it needs to read its code from on-chip RAM instead of going o�-chip; o�-chip access requires a one- or two-cycle delay whenever an instructionis read. The 32K words of on-chip RAM, however, are a single memory block; we cannot map one part of itinto program space and another part of it into data space. It is possible to con�gure the DSP so that theon-chip RAM is mapped into both program space and data space, allowing code to be executed from the

10http://cnx.rice.edu/modules/m10825/latest/thru6.asm11http://cnx.rice.edu/modules/m10825/latest/thru6.asm12http://cnx.rice.edu/modules/m10825/latest/thru6.asm

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 122: ECE 320 - Spring 2003 - cnx.org

116 CHAPTER 3. GENERAL REFERENCES

onboard memory and avoiding the extra delay. Figure 3.2 shows the DSP's memory map with the DSP'son-chip memory mapped into program space.

Figure 3.2: Hardware Memory Map

Because the same on-chip RAM is mapped into both program and data space, you must be careful notto overwrite your code with data or vice versa. To help you, the linker will place the code and data indi�erent parts of the memory map. If you use the .word or .space directives to inform the linker of all ofyour usage of data memory, you will not accidentally try to reuse memory that has been used to store codeor other data. (Remember that .word allocates one memory location and initializes it to the value givenas its parameter. .space 16*<words> allocates <words> words of uninitialized storage.) Avoid using

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 123: ECE 320 - Spring 2003 - cnx.org

117

syntaxes like stm #2000h,AR3 to point auxiliary registers to speci�c memory locations directly, as you mayaccidentally overwrite important code or data. Instead, use syntaxes like stm #hold,AR3, where hold is alabel for memory explicitly declared by .word or .space directives.

There are two types of internal memory on the TI TMS320C549 DSP: SARAM (Single Access RAM)and DARAM (Dual Access RAM). The �rst 8K of internal memory is DARAM; the next 24K is SARAM.The di�erence between these two types of memory is that while SARAM can only be read or written oncein a cycle, DARAM can be read or written twice in a cycle. This is relevant because the TMS320C549DSP core can access memory up to three times in each cycle: two accesses in Data RAM to read or writeoperands, and one access in Program RAM to fetch the next instruction. Both DARAM and SARAM aredivided into "pages"; access to memory located in di�erent "pages" will never con�ict. If, however, twooperands are fetched from the same "page" of SARAM (which is divided into 8K word pages: 2000h-3FFFh,4000h-5FFFFh, and 6000h-7FFFh) in the same cycle, a one-cycle stall will occur while the second memorylocation is accessed. Due to the pipeline, two memory accesses in the same instruction execute in di�erentcycles. However, if two successive instructions access the same area of SARAM, a stall can occur.

Part of the SARAM (from 6000h to 7FFFh) is used for storing your program code; a small amount ofSARAM below 6000h is also used for storing the DSP's stack. Part of the DARAM (from 0800h to 0FFFh)is used for the input and output bu�ers and is also unavailable. Ensure that any code you write does not useany of these reserved sections of data memory. In addition, the core �le reserves six locations in scratch-padRAM (060h to 065h); do not use these locations in your program code.

3.2.1.3 Sections and the Linker

You can use the .text directive to declare program code, and the .data directive to declare data. However,there are many more sections de�ned by the linker control �le. Note that the core �le uses memory in someof these sections.

You can place program code in the following sections using the .sect directive:

• .text: (.sect ".text") SARAM between 6000h and 7FFFh (8192 words)• .etext: (.sect ".etext") External RAM between 8000h and FEFFh (32,512 words) The test-vector

version of the DSP core stores the test vectors in the .etext section.

You can place data in the following sections:

• .data: (.sect ".data") DARAM between 1000h and 1FFFh (4096 words)• .sdata: (.sect ".sdata") SARAM between 2000h and 5EFFh (16,128 words)• .ldata: (.sect ".ldata") DARAM between 0080h and 07FFh (1,920 words)• .scratch: (.sect ".scratch") Scratchpad RAM between 0060h and 007Fh (32 words)• .edata: (.sect ".edata") External RAM between 8000h and FFFFh (32,768 words) (Requires special

initialization; if you need to use this memory, load and run the thru6.asm13 application before you loadyour application to initialize the EVM properly.)

If you always use these sections to allocate data storage regions instead of setting pointers to arbitrarylocations in memory, you will greatly reduce the chances of overwriting your program code or importantdata stored at other locations in memory. However, the linker cannot prevent your pointers from beingincremented past the end of the memory areas you have allocated.

Figure 3.3 shows the memory regions and sections de�ned by the linker control �le. Note that the sectionsde�ned in the linker control �le but not listed above are reserved by the core �le and should not be used.

13http://cnx.rice.edu/modules/m10825/latest/thru6.asm

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 124: ECE 320 - Spring 2003 - cnx.org

118 CHAPTER 3. GENERAL REFERENCES

Figure 3.3: Linker Memory Map and Section Names

3.2.1.4 Using the Core File

To simplify discussion, we have split up the thru6.asm14 �le into two separate �les for discussion. One,thru6a.asm15 contains only the code for using the A/D and D/A converters on the six-channel surroundboard; the other, ser_echo.asm16 contains only the code to send and receive data from the serial port.

14http://cnx.rice.edu/modules/m10825/latest/thru6.asm15http://cnx.org/content/m10513/latest/thru6a.asm16http://cnx.rice.edu/modules/m10821/latest/ser_echo.asm

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 125: ECE 320 - Spring 2003 - cnx.org

119

3.2.1.4.1 Using the D/A and A/D converters

Here we will discuss thru6a.asm17 , which is shown below. ser_echo.asm18 is discussed in Core File: SerialPort Communication Between MATLAB and TI TMS320C54x (Section 3.2.3).

1 .copy "core.asm"

2

3 .sect ".text"

4 main

5 ; Your initialization goes here.

6

7 loop

8 ; Wait for a new block of 64 samples to come in

9 WAITDATA

10

11 ; BlockLen = the number of samples that come from WAITDATA (64)

12 stm #BlockLen-1, BRC ; Repeat BlockLen=64 times

13 rptb block-1 ; ...from here to the "block" label

14

15 ld *AR6,16, A ; Receive ch1

16 mar *+AR6(2) ; Rcv data is in every other word

17 ld *AR6,16, B ; Receive ch2

18 mar *+AR6(2) ; Rcv data is in every other word

19

20 ; Code to process samples goes here.

21

22 sth A, *AR7+ ; Store into output buffer, ch1

23 sth A, *AR7+ ; ch2

24 sth A, *AR7+ ; ch3

25

26 sth B, *AR7+ ; Store into output buffer, ch4

27 sth B, *AR7+ ; ch5

28 sth B, *AR7+ ; ch6

29

30 block

31 b loop

Line 1 copies in the core code, which initializes the six-channel board and the serial interface, provides theinterface macros, and then jumps to "main" in your code. Line 3 declares that what follows should be placedin the program-code area in internal memory.

On Line 4, we �nd the label "main". This is the entry point for your code; once the DSP has �nishedinitializing the six-channel board and the serial port, the core �le jumps to this label.

On Line 9, there is a call to WAITDATA. WAITDATA waits for the next block of 64 samples to arrive fromthe A/D. When it returns, a pointer to the samples captured by the A/D is returned in AR6 (which can alsobe referred to as pINBUF); a pointer to the start of the output bu�er is returned in AR7 (also pOUTBUF). Note

17http://cnx.org/content/m10513/latest/thru6a.asm18http://cnx.rice.edu/modules/m10821/latest/ser_echo.asm

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 126: ECE 320 - Spring 2003 - cnx.org

120 CHAPTER 3. GENERAL REFERENCES

that WAITDATA simply calls the wait_fill subroutine in the core �le, which uses the B register internally,along with the DP register and the TC �ag; therefore, do not expect the value of B to be preserved across theWAITDATA call.

Lines 12 and 13 set up a block repeat. BlockLen is set by the core code as the length of a block; therepeat instruction therefore repeats for every sample time. Lines 15-18 retrieve one sample from each of thetwo channels; note that the received data is placed in every other memory location. Lines 22-24 place the�rst input channel into the �rst three output channels, and lines 26-28 place the second input channel intothe last three output channels. Figure 3.1 shows the relationship between the channel numbers shown in thecode and the inputs and outputs on the six-channel board.

Line 31 branches back to the top, where we wait for the next block to arrive and start over.

3.2.1.4.2 Using test vectors

A second version of the core �le o�ers the same interface as the standard core �le, but instead of readinginput samples from the A/D converters on the six-channel board and sending output samples to the D/Aconverters, it reads and writes from test vectors generated in MATLAB.

Test vectors provide a method for testing your code with known input. Given this known input and thespeci�cations of the system, we can use simulations to determine the expected output of the system. Wecan then compare the expected output with the measured output of the system. If the system is functioningproperly, the expected output and measured output should match19.

Testing your system with test vectors may seem silly in some cases, because you can see if simple �lterswork by looking at the output on the oscilloscope as you change the input frequency. However, they becomemore useful as you write more complicated code. With more complicated DSP algorithms, testing becomesmore di�cult; when you correct an error that results in one case not working, you may introduce an errorthat causes another case to work improperly. This may not be immediately visible if you simply look at theoscilloscope and function generator; the oscilloscope does not display the signal continuously and transienterrors may be hidden. In addition, it is easy to forget to check all possible input frequencies by sweeping thefunction generator after making a change.

More importantly, the test vectors also allow you to test signals that cannot be generated or displayedwith the oscilloscope and function generator. One important signal that cannot be generated or tested withthe function generator and oscilloscope is the impulse function; there is no way to view the impulse responseof a �lter directly without using test vectors. The unit impulse represents a particularly good test vectorbecause it is easy to compare the actual impulse response of a digital �lter against the expected impulseresponse. Testing using the impulse response also exposes the entire range of digital frequencies, unliketesting using periodic waveforms generated by the function generator.

Lastly, testing using test vectors allows us to isolate the DSP from the analog input and output section.This is useful because the analog sections have some limitations, including imperfect anti-aliasing and anti-imaging �lters. Testing using test vectors allows us to ensure that what we see is due only to the digitalsignal processing system, and not imperfections in the analog signal or electronics.

After generating a test vector in MATLAB, save it to a �le that can be brought into your code using theMATLAB command save_test_vector (available as save_test_vector.m20 ):

� save_test_vector('testvect.asm',ch1_in,ch2_in); % Save test vector

(where ch1_in and ch2_in are the input test vectors for input channel 1 and input channel 2; ch2_in canbe omitted, in which case both channels of the test-vector input will have the same data.)

19Will the expected output and the actual output from the DSP system match perfectly? Why or why not?20http://cnx.rice.edu/author/workgroups/90/m10017/save_test_vector.m

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 127: ECE 320 - Spring 2003 - cnx.org

121

Next, modify your code to include the test-vector support code and the test-vector �le you have created.This can be done by replacing the �rst line of the �le (which is a linker directive to copy in core.asm) withtwo lines. Instead of:

.copy "core.asm"

use:

.copy "testvect.asm"

.copy "vectcore.asm"

Note that, as usual, the whitespace in front of the .copy directive is required. (Download vectcore.asm21

into your work directory if you do not already have a copy.)The test vectors occupy the .etext section of program memory between 08000h and 0FEFFh. If you do

not use this section, it will not interfere with your program code or data. This memory block is large enoughto hold a test vector of up to 4,000 elements. Both channels of input, and all six channels of output, arestored in each test vector element.

Now assemble and load the �le, and reset and run as usual. After a few seconds, halt the DSP (using theHalt command under the Debug window) and verify that the DSP has halted at a branch statement thatbranches to itself: spin b spin.

Next, the test vector should be saved and loaded back into MATLAB. This is done by saving 6k memoryelements (where k is the length of the test vector in samples, and the 6 corresponds to the six output channels)starting with location 08000h in program memory. Do this by choosing File->Data->Save... in CodeComposer, then entering the �lename output.dat and pressing Enter. Next, enter 0x8000 in the Address�eld of the dialog box that appears, 6k in the Length �eld, and choosing "Program" from the drop-downmenu next to "Page." (Always ensure that you use the correct length - six times the length of the test vector- when you save your results.)

Last, use the read_vector function (available as read_vector.m22 ) to read the saved test vector outputinto MATLAB. Do this using the following MATLAB command:

� [ch1, ch2, ch3, ch4, ch5, ch6] = read_vector('output.dat');

The MATLAB vectors ch1 through ch6 now contain the output of your program code in response to theinput from the test vector.

3.2.2 Core File: Accessing External Memory on TI TMS320C54x23

3.2.2.1 Introduction

The TI DSP evaluation boards you use have a large amount of memory; in addition to the 32K words internalto the DSP, there are another 256K words of memory installed on the EVM board. For many exercises, the

21http://cnx.rice.edu/author/workgroups/90/m10017/vectcore.asm22http://cnx.rice.edu/author/workgroups/90/m10017/read_vector.m23This content is available online at <http://cnx.org/content/m10823/2.7/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 128: ECE 320 - Spring 2003 - cnx.org

122 CHAPTER 3. GENERAL REFERENCES

data sets are small, and you worked with only the on-chip memory of the DSP and were not expected toconsider how the use of memory impacted performance. However, the large delays often required in audioprocessing, for example, require that many thousands of samples be stored in memory. There is not enoughmemory on the DSP microprocessor itself to store a second or more of samples at a 44.1 kHz sample rate,so the o�-chip memory must be used.

3.2.2.2 EVM Memory Maps

As you have seen, the TI TMS320C54x DSP has two separate memory spaces, called Program and Data.Usually, Program contains your assembled program, and Data contains data, but sometimes it may beconvenient or more e�cient to violate this convention. (For instance, the firs instruction requires �ltercoe�cients in the Program address space.) The Data space is 64K long and is accessed using the 16-bitauxiliary registers (ARx). Although the Program space is normally accessed using 16-bit literals stored inyour program code, the Program space is, in fact, signi�cantly larger than 64K. Using special "extendedaddressing" instructions, the TI DSP can access up to 8192K-words of memory in the Program space. Theextended addressing instructions include far calls and jumps that reset the full 23-bit program counter, aswell as accumulator-addressed data-transfer instructions.

3.2.2.2.1 Internal and external memory

In many exercises, it is possible to store program instructions and data entirely in the DSP's on-chip ("in-ternal") memory. This internal memory has several advantages over o�-chip ("external") memory: it ismuch faster (data stored can be accessed without delay), and multiple reads and writes can access the DSP'son-chip memory simultaneously. However, many applications (including the audio delay e�ect of Using Ex-ternal Memory (Section 2.2.1)) require a data bu�er too large to �t into the on-chip memory. For these largebu�ers, we must use the larger but slower external memory.

When writing programs that require large amounts of memory, use the internal memory to hold yourcode, �lter coe�cients, and any small bu�ers you need. External memory should be used for large bu�ersthat you only access a few times per sample, like the delay bu�er described in Audio E�ects: Using ExternalMemory (Section 2.2.1).

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 129: ECE 320 - Spring 2003 - cnx.org

123

3.2.2.2.2 TMS320C549x DSP EVM memory maps

Figure 3.4: DSP EVM memory maps

As these memory maps show, the EVM's Data address space is addressed fully by the 16-bit auxiliaryregisters (ARx) and address-extension words and the mapping of Data memory is not a�ected by the OVLY

bit. However, because the Program memory space is much larger than can be addressed by the 16-bitaddressing register or the 16-bit literals stored in the program, it is split up into 64K (16-bit) pages by thehardware. Normal instructions, such as call, firs, and mvpd accept only 16-bit addresses, and can thereforeonly address the current "page" (usually address in the form 00xxxxh, which corresponds to the addressesthe linker uses for your program's code). To access the full 23-bit address space, the DSP o�ers special

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 130: ECE 320 - Spring 2003 - cnx.org

124 CHAPTER 3. GENERAL REFERENCES

accumulator-addressed load, store, and jump instructions.Further complicating matters is the fact that the OVLY bit a�ects the mapping of the Program memory

space. If you remember, before we load our DSP program, we have to change the PMST to FFE0h. We do thisto set the OVLY bit in the PMST, which maps the internal memory into both the Program and Data spaces ofthe DSP. If OVLY is 1, the internal memory appears in both the Data and Program memory address space atlocations 0080h to 07FFFh. Therefore, with OVLY set, anything written into Data memory below 07FFFh willoverwrite a program stored in the same location. 24 In addition, copies of the internal memory also appearin the extended Program address space, occupying locations 0080h-7FFFh of each page. Therefore, withOVLY set, any addresses to Program memory locations in the form of xx0000h-xx7FFFh reference internalmemory.

When OVLY is zero, internal memory is not mapped into the Program space at all; in this case, theProgram space includes only external memory. In this mode, all 192K words of external Program RAM areaccessible, although several wait states will be required for accessing each item of memory. In the overlaymode, only addresses in the ranges of 08000h-0FF00h, 1800h-1FFFFh, and 28000h-2FFFFh are available tostore your data bu�ers; the remaining addresses are unmapped or map to the on-chip RAM.

To escape this confusion and allow the full 192K-words of external Program RAM to be used for yourdata bu�ers, the core �le provides mechanisms for manipulating the PMST indirectly. Instead of accessing theexternal Program RAM directly, we can use the special macros to access the RAM that is normally "hidden"by the internal memory. This allows us to use the full range of external memory available: addresses 000000h-00FF00h and 010000h-02FFFF. However, since addresses 00FF00h-00FFFFh are reserved by the core �le, youmust be careful not to write to addresses in this range.

3.2.2.3 Accessing Extended Program RAM

The core �le provides two macros for accessing data stored in the external Program RAM: READPROG andWRITPROG. These macros allow the processor to copy data between data memory and external Programmemory. Both macros address external Program memory using the value in accumulator A. READPROG readsdata from the external Program memory location pointed to by A and writes it to the data memory locationpointed to by AR1. WRITPROG reads data from the memory location pointed to by AR1 and writes it to thelocation in external Program RAM speci�ed by accumulator A. Both macros take one parameter, a count;specifying 1 reads or writes one word from external memory, and specifying some other number n transfersn words starting at the locations pointed to by A and AR1. AR1 is left pointing at the word after the lastword read or written; no other registers are modi�ed.

For instance, the following code fragment loads the value contained in memory location 023456h into thelocation 0064h in data memory using the READPROG macro:

1 stm #64h,AR1 ; load 64h into AR1

2 ld #02h,16,A ; load 02h in high part of A

3 adds #3456h,A ; fill in low part of A

4 ; A contains 023456h

5 READPROG 1 ; read from 023456h in external Program RAM

6 ; into *AR1 in Data RAM

The WRITPROG macro can be used similarly to write into extended Program RAM:

24This is why the memory allocated for your program - 6000h-7FFFh - does not overlap with any of the space allocated forthe data segments.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 131: ECE 320 - Spring 2003 - cnx.org

125

1 stm #64h,AR1 ; load 64h into AR1

2 ld #02h,16,A ; load 02h in high part of A

3 adds #3456h,A ; fill in low part of A

4 ; A contains 023456h

5 WRITPROG 1 ; write from *AR1 in Data RAM to

6 ; 023456h in external Program RAM

Note that Code Composer will not display or allow you to change the contents of the external Program RAMon the memory-dump or disassembly screen, though you can view or change it indirectly by watching thee�ects of the READPROG and WRITPROG macros on data memory.

3.2.3 Core File: Serial Port Communication Between MATLAB and TITMS320C54x25

3.2.3.1 Using the Serial Port

The core �le supports the serial port installed on the TI TMS320C54x DSP. The serial port on the EVM isconnected with a cable to COM2 on the PC. Before jumping to your code, the core �le initializes the EVM'sserial port to 38,400 bits per second (bps) with no parity, eight data bits, and one stop bit (but it may benecessary to restart the DSP completely if the serial port does not work.) It then accepts characters receivedfrom the PC by the UART (Universal Asynchronous Receiver/Transmitter) and bu�ers them inmemory until your code retrieves them. It also can accept a block of bytes to transmit and send them tothe UART in sequence.

Two macros are used to control the serial port: READSER and WRITSER. Both accept one parameter.READSER n reads up to n characters from the serial input bu�er (the data coming from the PC) and placesthem in memory starting at *AR3. (AR3 is left pointing one past the last memory location written.) Theactual number of characters read is left in AR1. If AR1 is zero, then no characters were available in the inputbu�er.

WRITSER n adds n characters starting at *AR3 to the serial output bu�er; in other words, it queues themto be sent to the PC. AR3 is left pointing one location after the last memory location read.

Note that READSER and WRITSER modify registers AR0, AR1, AR2, AR3, and BK, as well as the �ag TC. Besure you restore these registers after calling READSER and WRITSER if you need them later in your code.

Note also that the core �le allows up to 126 characters to be stored in the input and output bu�ers.Neither the DSP hardware nor the core �le protect against serial-bu�er over�ows, so you must be carefulnot to allow the input and output bu�ers to over�ow. (The length of the bu�ers can be changed by editingser_rxlen and ser_txlen values in core.asm26 .) The bu�ers are 127 characters long; however, the codecannot distinguish between a completely-full and completely-empty bu�er. Therefore, only 126 characterscan be stored in the bu�ers.

It is easy to check if the input or output bu�ers in memory are empty. The input bu�er can be checkedby comparing the values stored in the memory locations srx_head and srx_tail; if both memory locationshold the same value, the input bu�er is empty. Likewise, the output bu�er can be checked by comparingthe values stored in memory locations stx_head and stx_tail. The number of characters in the bu�er canbe computed by subtracting the head pointer from the tail pointer; add the length of the bu�er (normally127) if the resulting distance is negative.

25This content is available online at <http://cnx.org/content/m10821/2.7/>.26http://cnx.rice.edu/author/workgroups/90/m10017/core.asm

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 132: ECE 320 - Spring 2003 - cnx.org

126 CHAPTER 3. GENERAL REFERENCES

The following example shows the minimal amount of code necessary to echo received data back throughthe serial port. It is available as ser_echo.asm27 .

1 .copy "core.asm"

2

3 .sect ".data"

4 hold .word 0

5

6 .sect ".text"

7 main

8 stm #hold,AR3 ; Read to hold location

9

10 READSER 1 ; Read one byte from serial port

11

12 cmpm AR1,#1 ; Did we get a character?

13 bc main,NTC ; if not, branch back to start

14

15 stm #hold,AR3 ; Write from hold location

16 WRITSER 1 ; ... one byte

17

18 b main

Line 8 sets AR3 to point to the location hold so that READSER will store serial data there. On Line 9,READSER 1 reads one serial byte into hold; the byte is placed in the low-order bits of the word, and thehigh-order bits are zeroed. If a byte was read, AR1 will be set to 1. AR1 is checked in Line 12; Line 13branches back to the top if no byte was read. Otherwise, AR3 is reset to hold (since READSER moved it), thenon Line 16, WRITSER sends the word received. Finally, Line 18 branches back to the start to receive anothercharacter.

3.2.3.2 Using MATLAB to Control the DSP

MATLAB allows you to create a visual interface with standard graphical user-interface (GUI) controlssuch as sliders, checkboxes, and radio buttons to call MATLAB scripts. The following scripts can be usedto create a sample interface:

• ser_set.m28 : Initializes the serial port and user interface• wrt_slid.m29 : Called when sliders are moved to send new data

3.2.3.2.1 Creating a MATLAB user interface

The following code (ser_set.m30 ) initializes the serial port COM2, then creates a minimal user interfaceconsisting of three sliders.

27http://cnx.org/content/m10821/latest/ser_echo.asm28http://cnx.rice.edu/author/workgroups/90/m10821/ser_set.m29http://cnx.rice.edu/author/workgroups/90/m10821/wrt_slid.m30http://cnx.rice.edu/author/workgroups/90/m10821/ser_set.m

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 133: ECE 320 - Spring 2003 - cnx.org

127

1 % ser_set: Initialize serial port and create three sliders

2

3 % Set serial port mode

4 !mode com2:38400,n,8,1

5

6 % open a blank figure for the slider

7 Fig = figure(1);

8

9 % open sliders

10

11 % first slider

12 sld1 = uicontrol(Fig,'units','normal','pos',[.2,.7,.5,.05],...

13 'style','slider','value',4,'max',254,'min',0,'callback','wrt_slid');

14

15 % second slider

16 sld2 = uicontrol(Fig,'units','normal','pos',[.2,.5,.5,.05],...

17 'style','slider','value',4,'max',254,'min',0,'callback','wrt_slid');

18

19 % third slider

20 sld3 = uicontrol(Fig,'units','normal','pos',[.2,.3,.5,.05],...

21 'style','slider','value',4,'max',254,'min',0,'callback','wrt_slid');

Line 4 of this code uses the Windows mode command to set up serial port COM2 (which is connected to theDSP) to match the serial port settings on the DSP evaluation board: 38,400 bps, no parity, eight data bits,and one stop bit. Line 7 then creates a new MATLAB �gure for the controls; this prevents the controls frombeing overlaid on any graph you may have already created.

Lines 12 through the end create the three sliders for the user interface. Several parameters are used tospecify the behavior of each slider. The �rst parameter, Fig, tells the slider to create itself in the windowwe created in Line 7. The rest of the parameters are property/value pairs:

units - normal tells MATLAB to use positioning relative to the window boundaries.pos - Tells MATLAB where to place the control.style - Tells MATLAB what type of control to place. slider creates a slider control.value - Tells MATLAB the default value for the control.max - Tells MATLAB the maximum value for the control.min - Tells MATLAB the minimum value for the control.callback - Tells MATLAB what script to call when the control is manipulated. wrt_slid.m31 is a

MATLAB �le that reads the values of the sliders and sends them to the DSP via the serial port.

3.2.3.2.1.1 User-interface callback function

Every time a slider is moved, the �le wrt_slid.m32 is called:

1 % wrt_slid: write values of sliders out to com port

2

3 % open com port for data transfer

31http://cnx.rice.edu/author/workgroups/90/m10821/wrt_slid.m32http://cnx.rice.edu/author/workgroups/90/m10821/wrt_slid.m

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 134: ECE 320 - Spring 2003 - cnx.org

128 CHAPTER 3. GENERAL REFERENCES

4 fid = fopen('com2:','w');

5

6 % send data from each slider

7 v = round(get(sld1,'value'));

8 fwrite(fid,v,'uint8');

9

10 v = round(get(sld2,'value'));

11 fwrite(fid,v,'uint8');

12

13 v = round(get(sld3,'value'));

14 fwrite(fid,v,'uint8');

15

16 % send reset pulse

17 fwrite(fid,255,'uint8');

18

19 % close com port connection

20 fclose(fid);

Line 4 of wrt_slid.m33 opens COM2 for writing. (It has already been initialized by ser_set.m34 .) ThenLine 7 reads the value of the �rst slider using MATLAB's get function to retrieve the value property. Thevalue is then rounded o� to create an integer, and the integer is sent as an 8-bit quantity to the DSP inLine 8. (The number that is sent at this step will appear when the serial port is read with READSER in yourcode.) Then the other two sliders are sent in the same way.

Line 17 sends 0xFF (255) to the DSP, which can be used to indicate that the three previously-transmittedvalues represent a complete set of data points. Your code can check for the value 255 to detect and correctsynchronization errors.

Line 20 closes the serial port. Note that MATLAB bu�ers the data being transmitted, and data is oftennot sent until the serial port is closed. Make sure you close the port after writing a data block to the serialport.

3.3 Code Composer

3.3.1 Debugging and Troubleshooting in Code Composer35

3.3.1.1 Introduction

Code Composer provides a rich debugging environment that allows you to step through your code, setbreakpoints, and examine registers as your code executes. This document provides a brief introduction tosome of these debugging features.

3.3.1.2 Debugging Code

3.3.1.2.1 Controlling program �ow

Breakpoints are points in the code where execution is stopped and control of the DSP is returned to thedebugger, allowing you to view the contents of registers and memory. Breakpoints can be activated or

33http://cnx.rice.edu/author/workgroups/90/m10821/wrt_slid.m34http://cnx.rice.edu/author/workgroups/90/m10821/ser_set.m35This content is available online at <http://cnx.org/content/m10522/2.9/>.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 135: ECE 320 - Spring 2003 - cnx.org

129

deactivated by double-clicking on any line of code in the disassembly window.36

You may also want to step through your program code, executing one line at a time, to follow branchesand watch memory change with the results of calculations. This can be done by choosing the "Step Into"or "Step Over" menu options from the "Debug" pull-down menu. (Unlike "Step Over," "Step Into" tracessubroutine calls caused by "call" opcodes.)

Like most DSPs, the DSP we are using is a pipelined processor, which means that instructions executein several stages over several clock cycles. Unfortunately, our debugger does not "�ush" the pipeline ofall current instructions when it halts your program; i.e., the DSP does not execute all remaining stages ofinstructions. As a consequence, when a program halts, the register values shown in the register and memorywindows may not actually the last values written. Often, the values shown correspond to values writtenseveral cycles before the current instruction. If it is necessary to know the exact contents of the registers atany particular point in the program �ow, simply insert three or more nop (no operation) instructions intoyour program after the instruction in question. Then, to debug, execute the instruction in question and thenop instructions that follow; this will �ush the pipeline.

You can choose the "Run Free" option from the "Debug" pull-down menu to allow the your code torun freely, ignoring any breakpoints. The code will continue running until explicitly halted with the "Halt"command.

Note that stopping and restarting execution sometimes confuses the A/D and D/A converters on the six-channel surround-sound board. If this happens, the output will generally go to zero or become completelyunrelated to the input signal. This can be �xed by simply resetting the DSP and starting your code fromthe beginning.

The bar on the left-hand side of the Code Composer Studio window contains shortcuts for many of thecommands in the Debug menu.

note: Practice setting breakpoints in your program code and single-stepping by setting a break-point after the WAITDATA call and tracing through the program �ow for several iterations of theFIR �lter code. What code does the WAITDATA call correspond to in the disassembly window?

3.3.1.3 Troubleshooting

The DSP boards can behave unexpectedly. If there is no output, try the following (from less to more drastic):

• Use the Debug menu to halt and reset the DSP, verify that the PMST is set to 0xFFE0, reload the code,reset the DSP, and restart the code.

• Press the "Reset" button on the DSP evaluation board, then use the Code Composer Studio menus tohalt, reset the DSP, verify the PMST, reload, reset the DSP again, and restart your code.

• Close Code Composer Studio, then power-cycle the DSP by unplugging power to the DSP board,waiting �ve seconds, and plugging it back in. Then restart Code Composer Studio. You will need toreset the PMST to 0xFFE0, then reload, reset the DSP, and execute your code.

If code does not load correctly, close Code Composer Studio and power-cycle the DSP.If problems persist after power-cycling the DSP, ensure that the DSP board is functioning properly by

executing previously veri�ed code. Do not forget to set the PMST and to reset the DSP from the CodeComposer Studio menu.

If you try all of these steps and still see problems, ask a teaching assistant for help.

36They can also be set by pressing F9 on a line in the source-�le window. However, verify that the breakpoint appears atthe corresponding location in the disassembly window if you do this; there have been problems with breakpoints being setinaccurately by this method in the past.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 136: ECE 320 - Spring 2003 - cnx.org

130 BIBLIOGRAPHY

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 137: ECE 320 - Spring 2003 - cnx.org

Bibliography

[1] R. Blahut. Digital Transmission of Information. Addison-Wesley, 1990.

[2] R. Blahut. Digital Transmission of Information. Addison-Wesley, 1990.

[3] J. Dattorro. E�ect design part 1: Reverberator and other �lters. Journal Audio Engineering Society,vol. 45:660�684, September 1996.

[4] R. Dressler. Dolby prologic surround decoder principles of operation.http://www.dolby.com/tech/whtppr.html.

[5] K. Gundry. An introduction to noise reduction. http://www.dolby.com/ken/.

[6] S. Haykin. Adaptive Filter Theory. Prentice Hall, 3rd edition edition, 1996.

[7] Motorola. Implementing IIR/FIR Filters with Motorola's DSP56000/SPS/DSP56001, Digital SignalProcessors. http://merchant.hibbertco.com/mtrlext/fs22/pdf-docs/motorola/apr7.rev2.pdf.

[8] J. G. Proakis and D. G. Manolakis. Digital Signal Processing: Principles, Algorithms, and Applications.Prentice-Hall, Upper Saddle River, NJ, 1996.

[9] J.G. Proakis. Digital Communications. McGraw-Hill, 3rd edition edition, 1995.

[10] J.G. Proakis. Digital Communications. McGraw-Hill, 3rd edition edition, 1995.

[11] L. Rabiner and B. H. Juang. Fundamentals of Speech Recognition. Prentice-Hall, Englewood Cli�s, NJ,1993.

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

131

Page 138: ECE 320 - Spring 2003 - cnx.org

132 INDEX

Index of Keywords and Terms

Keywords are listed by the section with that keyword (page numbers are in parentheses). Keywordsdo not necessarily appear in the text of the page. They are merely associated with that section. Ex.apples, � 1.1 (1) Terms are referenced by the page they appear on. Ex. apples, 1

k3, � 3.1.2(109)k5, � 3.1.2(109)k9, � 3.1.2(109)lk, � 3.1.2(109)

A absolute addressing, � 3.1.2(109)AC coupled, 17accumulators, � 3.1.2(109)adaptive �ltering, � 2.4.1(93)addressing modes, � 3.1.2(109)aliasing, � 1.4.2(40)anti-aliasing �lter, � 1.1.1(15), 17anti-imaging �lter, � 1.1.1(15), 17assembly, � 1.2.1(23), � 1.2.2(26)audio e�ects, � 2.2.1(84)autocorrelation, � 2.5.1(95), � 2.5.2(98),� 2.5.3(99)autocovariance, � 2.5.1(95), � 2.5.3(99)

B banz, � 1.4.4(42)bi-quad, � 1.3.1(32), 32bit-reversed, � 1.5.2(49)block processing, � 1.5.3(50)block repeat counter, � 1.3.1(32), � 1.3.4(36),36boxcar, � 1.5.2(49)BPSK, � 2.1.2(80)breakpoint, � 3.3.1(128)butter, � 1.3.1(32), 35, � 1.3.4(36), � 1.3.4(36)

C C language, � 1.5.3(50)calendar, � (3)carrier recovery, � 2.1.2(80)Chamberlin, � 2.3.2(89)Chip Support Library, 101Code Composer, � 3.3.1(128)Code Composer Studio, � 1.1.1(15)coe�cient quantization, � 1.3.3(35)coherent demodulation, � 2.1.2(80)conv, � 1.3.1(32), � 1.3.2(33), 35, � 1.3.4(36),� 1.3.4(36)core code, � 3.2.1(114)

correlation, � 2.5.1(95)cross-correlation, � 2.5.1(95)CSL, 101

D data, � 3.2.1(114)data memory, � 3.2.2(121)data pointer, � 3.1.2(109)debugging, � 3.3.1(128)decimation, � 1.4.4(42)decision statistic, 77decoder, � 2.3.1(87)delay, � 2.2.1(84), � 2.2.2(86)delay-locked loop, � 2.1.1(69), 74DFT, 48, � 1.5.2(49), � 1.5.3(50)di�erence equation, � 1.3.1(32), � 1.3.2(33),� 1.3.4(36), � 1.3.4(36)di�erence equations, 33digital communications, � 2.1.2(80)digital signal processing, � 1.5.3(50)direct addressing, � 3.1.2(109), 109, 111direct form II, � 1.3.1(32)direct fortm II, � 1.3.2(33), � 1.3.4(36),� 1.3.4(36)Discrete Fourier Transform, 48, � 1.5.2(49),� 1.5.3(50)Discrete Time Fourier Transform, � 1.5.2(49),� 1.5.3(50)division, � 2.5.3(99)DLL, 74dmad, � 3.1.2(109)Dolby Pro Logic, � 2.3.1(87)down-sample, � 1.4.4(42)downsampling, � 1.4.2(40)DP, � 3.1.2(109)DSP, � (1), � (3), � 1.1.1(15), � 1.2.1(23),� 1.2.2(26), � 1.3.1(32), � 1.3.2(33), � 1.3.3(35),� 1.3.4(36), � 1.4.2(40), � 1.4.4(42), � 1.5.2(49),� 2.1.1(69), � 2.1.2(80), � 2.2.1(84), � 2.2.2(86),� 2.3.1(87), � 2.3.2(89), � 2.4.1(93), � 2.5.1(95),� 2.5.2(98), � 2.5.3(99), � 3.1.1(107),� 3.1.2(109), � 3.2.1(114), � 3.2.2(121),� 3.2.3(125), � 3.3.1(128)

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 139: ECE 320 - Spring 2003 - cnx.org

INDEX 133

dst, � 3.1.2(109)DTFT, 48, � 1.5.2(49), � 1.5.3(50)

E early sample, 75echo, � 2.2.1(84), � 2.2.2(86)edata, � 3.2.1(114)elin�nite impulse response, � 1.3.4(36)ellip, � 1.3.1(32), � 1.3.2(33), � 1.3.4(36)elliptic low-pass �lter, � 1.3.1(32), � 1.3.2(33),� 1.3.4(36), 36encoder, � 2.3.1(87)etext, � 3.2.1(114)extended addressing, � 3.2.2(121)external memory, � 2.2.1(84), � 3.2.2(121)

F fast algorithms, � 1.5.3(50)Fast Fourier Transform, 48, � 1.5.2(49),� 1.5.3(50)feedback, � 1.3.1(32), � 2.2.2(86)FFT, 48, � 1.5.2(49), � 1.5.3(50)�lter, � 2.3.2(89)�nite impulse response, 15FIR, 15FIR �lter, � 1.1.1(15), � 1.2.2(26)�rs, � 1.2.2(26), 29�xed-point, � 1.3.3(35)Fourier transform, � 1.5.2(49)fractional arithmetic, � 1.2.1(23), � 3.1.1(107)fractional arithmetic mode, 24fractional arithmetic., 108frequency domain, � 1.5.3(50)freqz, � 1.3.1(32), � 1.3.2(33), 33, � 1.3.4(36),� 1.3.4(36)function generator, � 1.1.1(15)

G gain, 33gain factor, � 1.3.1(32), � 1.3.2(33),� 1.3.4(36), � 1.3.4(36)gradient descent, � 2.4.1(93)grading, � (3)graphical user interface, � 2.2.2(86),� 3.2.3(125)graphical user-interface, 126GUI, � 3.2.3(125), 126

H hamming, � 1.5.2(49)hexadecimal, � 1.2.1(23), 23Hilbert transform, � 2.3.1(87)hours, � (3)

I IDK, 100IDM, 101IIR, � 1.3.1(32), 32, � 1.3.2(33), � 1.3.4(36),

� 1.3.4(36), � 2.3.2(89), 96IIR �lter, � 1.3.3(35)Image Data Manager, 101Image Developers Kit, 100imaging, � 1.4.2(40)immediate addressing, � 3.1.2(109), 109, 110impulse response, � 1.3.1(32), � 1.3.2(33),� 1.3.4(36), � 1.3.4(36)indirect addressing, � 3.1.2(109), 109, 113in�nite impulse response, � 1.3.1(32),� 1.3.2(33), � 1.3.4(36), 96in�nite impulse-response, 32instructors, � (3)integer, � 1.3.3(35)internal memory, � 3.2.2(121)interpolation, � 1.4.4(42), � 2.1.2(80)

K K, � 3.1.2(109)keycards, � (3)

L lab access, � (3)labels, � 3.1.2(109)Laboratory, � (1)late sample, 75ldata, � 3.2.1(114)levinson-durbin, � 2.5.1(95)Levinson-Durbin algorithm, 97, � 2.5.2(98)linear predicitive coding, � 2.5.1(95)Linear prediction, 96Linear predictive coding, 95, � 2.5.2(98)linear time-invariant, � 1.3.1(32), 32,� 1.3.2(33), � 1.3.4(36), � 1.3.4(36)linker, � 3.2.1(114)LMS, � 2.4.1(93)low-pass, � 2.3.2(89)LPC, 95LTI, � 1.3.1(32), 32, � 1.3.2(33), � 1.3.4(36),� 1.3.4(36)

M mac, � 1.2.1(23), � 1.2.2(26)mainlobe, � 1.5.2(49)matched �lter, � 2.1.1(69), 69MATLAB, � 1.1.1(15)memory map, � 2.2.1(84), � 3.2.1(114),� 3.2.2(121)memory-mapped registers, � 3.1.2(109)MMR, � 3.1.2(109)MMRx, � 3.1.2(109)MMRy, � 3.1.2(109)multirate, � 1.4.2(40)multirate processing, 39multirate sampling, � 1.4.4(42)

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 140: ECE 320 - Spring 2003 - cnx.org

134 INDEX

multirate system, � 1.4.4(42)music, � 2.3.2(89)

N narrow-band, � 2.3.2(89)NCO, 81noise, � 2.1.1(69)nonlinear phase, � 1.3.1(32)notch �lter, � 1.3.1(32), � 1.3.2(33), 35,� 1.3.4(36), � 1.3.4(36)numerically-controlled oscillator, � 2.1.2(80),81

O o�-chip memory, � 3.2.2(121)on-chip memory, � 3.2.2(121)on-time sample, 75opcode, 109oscilliscope, � 1.1.1(15)over�ow, � 1.3.2(33), 33, � 3.1.1(107)overlay, � 3.2.2(121)OVLY, � 3.2.2(121)

P phase-locked loop, � 2.1.2(80), 80PLL, 80pmad, � 3.1.2(109)PMST, 17, � 3.2.2(121)poles, � 1.3.1(32), � 1.3.2(33), 33, � 1.3.4(36),� 1.3.4(36)policies, � (3)power-cycle, � 3.3.1(128), 129process, � (1)processor mode status register, 17program memory, � 3.2.2(121)

Q QPSK, � 2.1.2(80)quadrature phase-shift keying, � 2.1.1(69)quantize, � 1.3.1(32), 35, � 1.3.4(36),� 1.3.4(36)quizzes, � (3)

R READPROG, � 2.2.1(84), � 3.2.2(121)readser, � 3.2.3(125)receiver, � 2.1.1(69)rptz, � 1.2.1(23)

S sample-rate compressor, 39, � 1.4.2(40),� 1.4.4(42)sample-rate expander, 39, � 1.4.2(40),� 1.4.4(42)schedules, � (3)scratch, � 3.2.1(114)sdata, � 3.2.1(114)

serial port, � 1.4.4(42), � 2.2.2(86), � 3.2.3(125)sidelobe, � 1.5.2(49)sign extension, 109sign-extended, 109signal, � (1)Smem, � 3.1.2(109)SP, � 3.1.2(109)spectral analysis, � 1.5.3(50)spectrum, � 1.5.3(50)speech, � 2.5.1(95), � 2.5.2(98)speech analysis, � 2.5.2(98)speech coding, � 2.5.1(95), � 2.5.2(98)speech compression, � 2.5.1(95), � 2.5.2(98)speech processing, � 2.5.3(99)speech synthesis, � 2.5.1(95), � 2.5.2(98)src, � 3.1.2(109)stability, � 1.3.3(35)stack pointer, � 3.1.2(109)stl, � 1.2.1(23)surround sound, � 2.3.1(87)system identi�cation, � 2.4.1(93)

T test vector, � 1.1.1(15), � 3.2.1(114)text, � 3.2.1(114)TMS320C54x, � 1.1.1(15)twiddle-factor, � 1.5.2(49)two's complement, � 3.1.1(107)Two's-complement, 107two's-compliment, � 1.2.1(23)

U UART, 125Universal Asynchronous Receiver/Transmitter,125up-sample, � 1.4.4(42)upsampling, � 1.4.2(40)

V VCO, 81voltage-controlled oscillator, � 2.1.2(80), 81

W windowing, � 1.5.2(49), � 1.5.3(50)WRITPROG, � 2.2.1(84), � 3.2.2(121)writser, � 3.2.3(125)

X xcorr, � 2.5.2(98)Xmem, � 3.1.2(109)

Y Ymem, � 3.1.2(109)

Z zero-pad, � 1.5.2(49)zeros, � 1.3.1(32), � 1.3.2(33), 33, � 1.3.4(36),� 1.3.4(36)

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 141: ECE 320 - Spring 2003 - cnx.org

ATTRIBUTIONS 135

Attributions

Collection: ECE 320 - Spring 2003Edited by: Douglas L. JonesURL: http://cnx.org/content/col10096/1.2/License: http://creativecommons.org/licenses/by/1.0

Module: "Preface for U of I DSP Laboratory"By: Douglas L. JonesURL: http://cnx.org/content/m10681/2.12/Pages: 1-2Copyright: Douglas L. JonesLicense: http://creativecommons.org/licenses/by/1.0

Module: "ECE 320 Course Overview"By: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Dima Moussa, Daniel SachsURL: http://cnx.org/content/m10660/2.26/Pages: 3-7Copyright: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Dima Moussa, DanielSachsLicense: http://creativecommons.org/licenses/by/1.0

Module: "Announcements"By: Mark ButalaURL: http://cnx.org/content/m10841/2.34/Pages: 9-11Copyright: Mark ButalaLicense: http://creativecommons.org/licenses/by/1.0

Module: "DSP Development Environment: Introductory Exercise for TI TMS320C54x (ECE 420 Speci�c)"Used here as: "Lab 0: Hardware Introduction"By: Mark Butala, Jason LaskaURL: http://cnx.org/content/m11019/2.7/Pages: 15-21Copyright: Mark Butala, Jason LaskaLicense: http://creativecommons.org/licenses/by/1.0

Module: "FIR Filtering: Basic Assembly Exercise for TI TMS320C54x"Used here as: "Lab 1: Prelab"By: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, Michael Kramer,Dima Moussa, Daniel Sachs, Brian Wade, Jason LaskaURL: http://cnx.org/content/m10022/2.22/Pages: 23-25Copyright: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, MichaelKramer, Dima Moussa, Daniel Sachs, Brian Wade, Jason LaskaLicense: http://creativecommons.org/licenses/by/1.0

Module: "FIR Filtering: Exercise for TI TMS320C54x (ECE 320 speci�c)"Used here as: "Lab 1: Lab"By: Mark ButalaURL: http://cnx.org/content/m11020/2.6/Pages: 26-30Copyright: Mark ButalaLicense: http://creativecommons.org/licenses/by/1.0

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 142: ECE 320 - Spring 2003 - cnx.org

136 ATTRIBUTIONS

Module: "IIR Filtering: Introduction"Used here as: "Lab 2: Theory"By: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, Michael Kramer,Dima Moussa, Daniel Sachs, Brian WadeURL: http://cnx.org/content/m10025/2.22/Page: 32Copyright: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, MichaelKramer, Dima Moussa, Daniel Sachs, Brian WadeLicense: http://creativecommons.org/licenses/by/1.0

Module: "IIR Filtering: Filter-Design Exercise in MATLAB"Used here as: "Lab 2: Prelab (part 1)"By: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, Michael Kramer,Dima Moussa, Daniel Sachs, Brian WadeURL: http://cnx.org/content/m10623/2.11/Pages: 33-34Copyright: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, MichaelKramer, Dima Moussa, Daniel Sachs, Brian WadeLicense: http://creativecommons.org/licenses/by/1.0

Module: "IIR Filtering: Filter-Coe�cient Quantization Exercise in MATLAB"Used here as: "Lab 2: Prelab (part 2)"By: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, Michael Kramer,Dima Moussa, Daniel Sachs, Brian WadeURL: http://cnx.org/content/m10813/2.5/Page: 35Copyright: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, MichaelKramer, Dima Moussa, Daniel Sachs, Brian WadeLicense: http://creativecommons.org/licenses/by/1.0

Module: "IIR Filtering: Exercise on TI TMS320C54x (ECE 320 speci�c)"Used here as: "Lab 2: Lab"By: Mark ButalaURL: http://cnx.org/content/m11021/2.4/Pages: 36-37Copyright: Mark ButalaLicense: http://creativecommons.org/licenses/by/1.0

Module: "Multirate Filtering: Introduction (ECE 320 speci�c)"Used here as: "Lab 3: Theory"By: Mark ButalaURL: http://cnx.org/content/m10858/2.6/Page: 39Copyright: Mark ButalaLicense: http://creativecommons.org/licenses/by/1.0

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 143: ECE 320 - Spring 2003 - cnx.org

ATTRIBUTIONS 137

Module: "Multirate Filtering: Theory Exercise"Used here as: "Lab 3: Prelab (part 1)"By: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, Michael Kramer,Dima Moussa, Daniel Sachs, Brian WadeURL: http://cnx.org/content/m10620/2.14/Page: 40Copyright: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, MichaelKramer, Dima Moussa, Daniel Sachs, Brian WadeLicense: http://creativecommons.org/licenses/by/1.0

Module: "Multirate Filtering: Filter-Design Exercise in MATLAB (ECE 320 speci�c)"Used here as: "Lab 3: Prelab (part 2)"By: Mark ButalaURL: http://cnx.org/content/m10859/2.4/Page: 41Copyright: Mark ButalaLicense: http://creativecommons.org/licenses/by/1.0

Module: "Multirate Filtering: Implementation on TI TMS320C54x (ECE 320 speci�c)"Used here as: "Lab 3: Lab"By: Matthew BerryURL: http://cnx.org/content/m10617/2.9/Pages: 42-46Copyright: Matthew BerryLicense: http://creativecommons.org/licenses/by/1.0

Module: "Spectrum Analyzer: Introduction to Fast Fourier Transform (ECE 320 speci�c)"Used here as: "Lab 4: Theory"By: Mark ButalaURL: http://cnx.org/content/m10860/2.6/Page: 48Copyright: Mark ButalaLicense: http://creativecommons.org/licenses/by/1.0

Module: "Spectrum Analyzer: MATLAB Exercise"Used here as: "Lab 4: Prelab"By: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, Michael Kramer,Dima Moussa, Daniel Sachs, Brian WadeURL: http://cnx.org/content/m10625/2.8/Page: 49Copyright: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, MichaelKramer, Dima Moussa, Daniel Sachs, Brian WadeLicense: http://creativecommons.org/licenses/by/1.0

Module: "Spectrum Analyzer: Processor Exercise Using C Language"Used here as: "Lab 4: Lab"By: Matthew BerryURL: http://cnx.org/content/m10658/2.10/Pages: 50-57Copyright: Matthew BerryLicense: http://creativecommons.org/licenses/by/1.0

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 144: ECE 320 - Spring 2003 - cnx.org

138 ATTRIBUTIONS

Module: "Low-Pass Filter Implementation: Introduction"Used here as: "Lab 5: Introduction"By: Mark ButalaURL: http://cnx.org/content/m11055/2.3/Page: 59Copyright: Mark ButalaLicense: http://creativecommons.org/licenses/by/1.0

Module: "Low-Pass Fitler Implementation: Filter Speci�cation"Used here as: "Lab 5: Filter Speci�cation"By: Mark ButalaURL: http://cnx.org/content/m11056/2.8/Pages: 60-61Copyright: Mark ButalaLicense: http://creativecommons.org/licenses/by/1.0

Module: "Low-Pass Filter Implementation: Prelab"Used here as: "Lab 5: Prelab"By: Mark ButalaURL: http://cnx.org/content/m11057/2.6/Pages: 62-66Copyright: Mark ButalaLicense: http://creativecommons.org/licenses/by/1.0

Module: "Low-Pass Filter Implementation: Grading"Used here as: "Lab 5: Grading"By: Mark ButalaURL: http://cnx.org/content/m11058/2.4/Pages: 67-68Copyright: Mark ButalaLicense: http://creativecommons.org/licenses/by/1.0

Module: "Digital Receivers: Symbol-Timing Recovery for QPSK"Used here as: "Digital Receiver: Symbol-Timing Recovery for QPSK"By: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, Michael Kramer,Dima Moussa, Daniel Sachs, Brian WadeURL: http://cnx.org/content/m10485/2.14/Pages: 69-80Copyright: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, MichaelKramer, Dima Moussa, Daniel Sachs, Brian WadeLicense: http://creativecommons.org/licenses/by/1.0

Module: "Digital Receiver: Carrier Recovery"By: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Dima Moussa, Daniel SachsURL: http://cnx.org/content/m10478/2.16/Pages: 80-84Copyright: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Dima Moussa, DanielSachs, Jake Janovetz, Michael Kramer, Brian WadeLicense: http://creativecommons.org/licenses/by/1.0

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 145: ECE 320 - Spring 2003 - cnx.org

ATTRIBUTIONS 139

Module: "Audio E�ects: Using External Memory"By: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Dima Moussa, Daniel SachsURL: http://cnx.org/content/m10480/2.17/Pages: 84-86Copyright: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Dima Moussa, DanielSachs, Jake Janovetz, Michael Kramer, Brian WadeLicense: http://creativecommons.org/licenses/by/1.0

Module: "Audio E�ects: Real-Time Control with the Serial Port"By: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Dima Moussa, Daniel SachsURL: http://cnx.org/content/m10483/2.24/Pages: 86-87Copyright: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Dima Moussa, DanielSachs, Jake Janovetz, Michael Kramer, Brian WadeLicense: http://creativecommons.org/licenses/by/1.0

Module: "Surround Sound: Passive Encoding and Decoding"By: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, Michael Kramer,Dima Moussa, Daniel Sachs, Brian WadeURL: http://cnx.org/content/m10484/2.13/Pages: 87-89Copyright: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, MichaelKramer, Dima Moussa, Daniel Sachs, Brian WadeLicense: http://creativecommons.org/licenses/by/1.0

Module: "Surround Sound: Chamberlin Filters"By: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, Michael Kramer,Dima Moussa, Daniel Sachs, Brian WadeURL: http://cnx.org/content/m10479/2.15/Pages: 89-92Copyright: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, MichaelKramer, Dima Moussa, Daniel Sachs, Brian WadeLicense: http://creativecommons.org/licenses/by/1.0

Module: "Adaptive Filtering: LMS Algorithm"By: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Dima Moussa, Daniel SachsURL: http://cnx.org/content/m10481/2.14/Pages: 93-95Copyright: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Dima Moussa, DanielSachs, Jake Janovetz, Michael Kramer, Brian WadeLicense: http://creativecommons.org/licenses/by/1.0

Module: "Speech Processing: Theory of LPC Analysis and Synthesis"By: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, Michael Kramer,Dima Moussa, Daniel Sachs, Brian WadeURL: http://cnx.org/content/m10482/2.19/Pages: 95-98Copyright: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, MichaelKramer, Dima Moussa, Daniel Sachs, Brian WadeLicense: http://creativecommons.org/licenses/by/1.0

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 146: ECE 320 - Spring 2003 - cnx.org

140 ATTRIBUTIONS

Module: "Speech Processing: LPC Exercise in MATLAB"By: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, Michael Kramer,Dima Moussa, Daniel Sachs, Brian WadeURL: http://cnx.org/content/m10824/2.5/Pages: 98-99Copyright: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, MichaelKramer, Dima Moussa, Daniel Sachs, Brian WadeLicense: http://creativecommons.org/licenses/by/1.0

Module: "Speech Processing: LPC Exercise on TI TMS320C54x"By: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, Michael Kramer,Dima Moussa, Daniel Sachs, Brian WadeURL: http://cnx.org/content/m10825/2.6/Pages: 99-100Copyright: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, MichaelKramer, Dima Moussa, Daniel Sachs, Brian WadeLicense: http://creativecommons.org/licenses/by/1.0

Module: "Video Processing Manuals"Used here as: "Video Processing: Manuals"By: Mark ButalaURL: http://cnx.org/content/m10889/2.5/Page: 100Copyright: Mark ButalaLicense: http://creativecommons.org/licenses/by/1.0

Module: "Introduction to the IDK"Used here as: "Video Processing: Introduction to the IDK"By: Mark ButalaURL: http://cnx.org/content/m10926/2.7/Pages: 100-106Copyright: Mark ButalaLicense: http://creativecommons.org/licenses/by/1.0

Module: "Two's Complement and Fractional Arithmetic for 16-bit Processors"By: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Dima Moussa, Daniel Sachs,Jason LaskaURL: http://cnx.org/content/m10808/2.9/Pages: 107-109Copyright: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Dima Moussa, DanielSachs, Jason LaskaLicense: http://creativecommons.org/licenses/by/1.0

Module: "Addressing Modes for TI TMS320C54x"By: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Dima Moussa, Daniel SachsURL: http://cnx.org/content/m10806/2.7/Pages: 109-114Copyright: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, MichaelKramer, Dima Moussa, Daniel Sachs, Brian WadeLicense: http://creativecommons.org/licenses/by/1.0

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 147: ECE 320 - Spring 2003 - cnx.org

ATTRIBUTIONS 141

Module: "Core File: Introduction to Six-Channel Board for TI EVM320C54"By: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Dima Moussa, Daniel SachsURL: http://cnx.org/content/m10513/2.13/Pages: 114-121Copyright: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, MichaelKramer, Dima Moussa, Daniel Sachs, Brian WadeLicense: http://creativecommons.org/licenses/by/1.0

Module: "Core File: Accessing External Memory on TI TMS320C54x"By: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Dima Moussa, Daniel SachsURL: http://cnx.org/content/m10823/2.7/Pages: 121-125Copyright: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, MichaelKramer, Dima Moussa, Daniel Sachs, Brian WadeLicense: http://creativecommons.org/licenses/by/1.0

Module: "Core File: Serial Port Communication Between MATLAB and TI TMS320C54x"By: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Dima Moussa, Daniel SachsURL: http://cnx.org/content/m10821/2.7/Pages: 125-128Copyright: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, MichaelKramer, Dima Moussa, Daniel Sachs, Brian WadeLicense: http://creativecommons.org/licenses/by/1.0

Module: "Debugging and Troubleshooting in Code Composer"By: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Dima Moussa, Daniel SachsURL: http://cnx.org/content/m10522/2.9/Pages: 128-129Copyright: Douglas L. Jones, Swaroop Appadwedula, Matthew Berry, Mark Haun, Jake Janovetz, MichaelKramer, Dima Moussa, Daniel Sachs, Brian WadeLicense: http://creativecommons.org/licenses/by/1.0

Available for free at Connexions <http://cnx.org/content/col10096/1.2>

Page 148: ECE 320 - Spring 2003 - cnx.org

ECE 320 - Spring 2003Development of real-time digital signal processing (DSP) systems using a DSP microprocessor; several struc-tured laboratory exercises, such as sampling and digital �ltering, followed by an extensive DSP project ofthe student's choice.

About ConnexionsSince 1999, Connexions has been pioneering a global system where anyone can create course materials andmake them fully accessible and easily reusable free of charge. We are a Web-based authoring, teaching andlearning environment open to anyone interested in education, including students, teachers, professors andlifelong learners. We connect ideas and facilitate educational communities.

Connexions's modular, interactive courses are in use worldwide by universities, community colleges, K-12schools, distance learners, and lifelong learners. Connexions materials are in many languages, includingEnglish, Spanish, Chinese, Japanese, Italian, Vietnamese, French, Portuguese, and Thai. Connexions is partof an exciting new information distribution system that allows for Print on Demand Books. Connexionshas partnered with innovative on-demand publisher QOOP to accelerate the delivery of printed coursematerials and textbooks into classrooms worldwide at lower prices than traditional academic publishers.