msc thesisce-publications.et.tudelft.nl/publications/741_voiceover...msc thesis voice-over ip...

103
Computer Engineering Mekelweg 4, 2628 CD Delft The Netherlands http://ce.et.tudelft.nl/ 2006 MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of Electrical Engineering, Mathematics and Computer Science CE-MS-2006-07 The Internet provides a large number of services, e.g., e-commerce, file-sharing, and email, allowing people from all over the world to do business, exchange data, and communicate, respectively. A service that is gaining popularity is Voice-over-IP (VoIP) that allows people to communicate with each other without utilizing the plain old tele- phone service (POTS) – even though telephone lines may be used to carry the digital data. Due to this popularity, more hardware VoIP phones are being developed. Most of these embedded systems are based on general-purpose processors. As new codecs and new encryp- tion algorithms emerge, performance must be increased and higher clock rates may not always result in a proportional performance gain. Additionally, it is impractical to include dedicated chips to improve performance. Consequently, reconfigurable hardware is increasingly often used to improve performance and maintain flexibility. In this thesis, we describe our work in building a VoIP phone using reconfig- urable hardware to accelerate the most time-consuming operations. Our approach is described in the following. First, we implement VoIP in software. Second, we determine through profiling the most time-consuming operations. As a result, the most time-consuming operations were GSM encoding and GSM decoding. In our first attempt, we implemented a complete GSM decoder in hardware. Although it achieved a speedup of 34 times, it exceeded the number of slices reserved for hardware acceleration. More detailed profiling of the GSM encoder and decoder revealed that the Short Term Analysis Filter and the Long Term Predictor within the GSM encoder were responsible for 75% of the encoding time. The Short Term Synthesis Filter was found to be responsible for 80% of the GSM decoding time. Finally, we implemented these parts in hardware resulting in a total speedup of 2.7 times for the GSM encoder and 4.2 times for the GSM decoder, while only occupying 11% of the total area for hardware acceleration.

Upload: others

Post on 21-Jul-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

Computer EngineeringMekelweg 4,

2628 CD DelftThe Netherlands

http://ce.et.tudelft.nl/

2006

MSc THESIS

Voice-over IP implementation on a FieldProgrammable Gate Array

M. van den Braak

Abstract

Faculty of Electrical Engineering, Mathematics and Computer Science

CE-MS-2006-07

The Internet provides a large number of services, e.g., e-commerce,file-sharing, and email, allowing people from all over the world to dobusiness, exchange data, and communicate, respectively. A servicethat is gaining popularity is Voice-over-IP (VoIP) that allows peopleto communicate with each other without utilizing the plain old tele-phone service (POTS) – even though telephone lines may be used tocarry the digital data. Due to this popularity, more hardware VoIPphones are being developed. Most of these embedded systems arebased on general-purpose processors. As new codecs and new encryp-tion algorithms emerge, performance must be increased and higherclock rates may not always result in a proportional performance gain.Additionally, it is impractical to include dedicated chips to improveperformance. Consequently, reconfigurable hardware is increasinglyoften used to improve performance and maintain flexibility. In thisthesis, we describe our work in building a VoIP phone using reconfig-urable hardware to accelerate the most time-consuming operations.Our approach is described in the following. First, we implementVoIP in software. Second, we determine through profiling the mosttime-consuming operations. As a result, the most time-consuming

operations were GSM encoding and GSM decoding. In our first attempt, we implemented a complete GSMdecoder in hardware. Although it achieved a speedup of 34 times, it exceeded the number of slices reservedfor hardware acceleration. More detailed profiling of the GSM encoder and decoder revealed that the ShortTerm Analysis Filter and the Long Term Predictor within the GSM encoder were responsible for 75% of theencoding time. The Short Term Synthesis Filter was found to be responsible for 80% of the GSM decodingtime. Finally, we implemented these parts in hardware resulting in a total speedup of 2.7 times for theGSM encoder and 4.2 times for the GSM decoder, while only occupying 11% of the total area for hardwareacceleration.

Page 2: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of
Page 3: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

Voice-over IP implementation on a FieldProgrammable Gate Array

THESIS

submitted in partial fulfillment of therequirements for the degree of

MASTER OF SCIENCE

in

COMPUTER ENGINEERING

by

M. van den Braakborn in Purmerend, The Netherlands

Computer EngineeringDepartment of Electrical EngineeringFaculty of Electrical Engineering, Mathematics and Computer ScienceDelft University of Technology

Page 4: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of
Page 5: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

Voice-over IP implementation on a FieldProgrammable Gate Array

by M. van den Braak

Abstract

The Internet provides a large number of services, e.g., e-commerce, file-sharing, and email,allowing people from all over the world to do business, exchange data, and communicate,respectively. A service that is gaining popularity is Voice-over-IP (VoIP) that allows peo-

ple to communicate with each other without utilizing the plain old telephone service (POTS)– even though telephone lines may be used to carry the digital data. Due to this popularity,more hardware VoIP phones are being developed. Most of these embedded systems are based ongeneral-purpose processors. As new codecs and new encryption algorithms emerge, performancemust be increased and higher clock rates may not always result in a proportional performancegain. Additionally, it is impractical to include dedicated chips to improve performance. Conse-quently, reconfigurable hardware is increasingly often used to improve performance and maintainflexibility. In this thesis, we describe our work in building a VoIP phone using reconfigurablehardware to accelerate the most time-consuming operations. Our approach is described in thefollowing. First, we implement VoIP in software. Second, we determine through profiling themost time-consuming operations. As a result, the most time-consuming operations were GSMencoding and GSM decoding. In our first attempt, we implemented a complete GSM decoder inhardware. Although it achieved a speedup of 34 times, it exceeded the number of slices reservedfor hardware acceleration. More detailed profiling of the GSM encoder and decoder revealedthat the Short Term Analysis Filter and the Long Term Predictor within the GSM encoder wereresponsible for 75% of the encoding time. The Short Term Synthesis Filter was found to beresponsible for 80% of the GSM decoding time. Finally, we implemented these parts in hardwareresulting in a total speedup of 2.7 times for the GSM encoder and 4.2 times for the GSM decoder,while only occupying 11% of the total area for hardware acceleration.

Laboratory : Computer EngineeringCodenumber : CE-MS-2006-07

Committee Members :

Advisor: J. S. S. M. Wong, CE, TU Delft

Chairperson: S. Vassiliadis, CE, TU Delft

Member: W. A. Serdijn, Elca, TU Delft

Member: G. Brown, CS, Indiana University(USA)

i

Page 6: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

ii

Page 7: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

To those who made this possible

Wise men talk because they have something to say; fools,because they have to say something. - Plato (427 - 347 BC)

iii

Page 8: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

iv

Page 9: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

Contents

List of Figures vii

List of Tables ix

Acknowledgements xi

1 Introduction 11.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11.2 Motivation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.3 Project definition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 21.4 Project goals and approach . . . . . . . . . . . . . . . . . . . . . . . . . . 21.5 Thesis overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background 52.1 Related work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52.2 Signaling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2.1 Signaling protocols . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.2 SIP protocol . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2.3 SIP Authentication . . . . . . . . . . . . . . . . . . . . . . . . . . . 102.2.4 Setting up a SIP connection: Inviting a friend . . . . . . . . . . . . 11

2.3 RTP: Real-time Transport Protocol . . . . . . . . . . . . . . . . . . . . . . 132.4 Voice compression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

2.4.1 Codec choice . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.4.2 GSM encoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 152.4.3 GSM decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.5 Local networks . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.5.1 NAT: Network Address Translation . . . . . . . . . . . . . . . . . . 202.5.2 The NAT problem . . . . . . . . . . . . . . . . . . . . . . . . . . . 212.5.3 STUN: Simple Traversal of UDP over NAT . . . . . . . . . . . . . 23

2.6 Additional functionalities . . . . . . . . . . . . . . . . . . . . . . . . . . . 232.6.1 DHCP: Dynamic Host Configuration Protocol . . . . . . . . . . . . 232.6.2 DNS lookup: Domain Name Service . . . . . . . . . . . . . . . . . 232.6.3 NTP: Network Time Protocol . . . . . . . . . . . . . . . . . . . . . 24

2.7 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24

3 Implementation details 253.1 Hardware . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3.1.1 Buses . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 263.1.2 Memory . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.1.3 Ethernet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

v

Page 10: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

3.1.4 AC97 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.1.5 PS/2 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.1.6 VGA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.1.7 RS232 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.1.8 Interrupts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32

3.2 Software . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.2.1 Bus structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 323.2.2 Ethernet . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

3.3 Timing Measurement . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.4 Reconfigurable hardware design . . . . . . . . . . . . . . . . . . . . . . . . 35

3.4.1 GSM Decoder . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 363.4.2 Short Term Filters . . . . . . . . . . . . . . . . . . . . . . . . . . . 373.4.3 Cross Correlation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 383.4.4 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3.5 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

4 Experimental Results 434.1 Timing results software version . . . . . . . . . . . . . . . . . . . . . . . . 434.2 Hardware acceleration results . . . . . . . . . . . . . . . . . . . . . . . . . 454.3 Communication overhead . . . . . . . . . . . . . . . . . . . . . . . . . . . 464.4 Conclusions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 47

5 Conclusions 495.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 495.2 Main contributions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 505.3 Recommendations for further research . . . . . . . . . . . . . . . . . . . . 51

Bibliography 53

A Software description 55A.1 Source file deployment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55A.2 Main loop . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56A.3 Functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57

A.3.1 Linker script . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

B Hardware description 69B.1 Crosscorrelation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69

B.1.1 FSM pseudocode . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69B.1.2 Synthesis report . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70B.1.3 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73

B.2 Dual Lattice Filter . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78B.2.1 Synthesis report . . . . . . . . . . . . . . . . . . . . . . . . . . . . 78B.2.2 Simulation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81

C Abbreviations 85

vi

Page 11: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

List of Figures

1.1 Voice-over IP system . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1

2.1 SIP request example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.2 SIP response example . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92.3 SIP REGISTER request sequences . . . . . . . . . . . . . . . . . . . . . . 112.4 SIP INVITE sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.5 SIP BYE sequence . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122.6 RTP header (source: RFC 3550 by IETF) . . . . . . . . . . . . . . . . . . 132.7 GSM Encoder (source: GSM: Digital Cellular Telecommications Sys-

tem(Phase 2); Full Rate Speech; part 2: Transcoding (GSM 06.10 version4.2.0) by ETSI, page 13) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 16

2.8 Short Term Analysis Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 162.9 Short Term Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 172.10 Short Term Analysis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182.11 GSM Decoder (source: GSM: Digital Cellular Telecommications Sys-

tem(Phase 2); Full Rate Speech; part 2: Transcoding (GSM 06.10 version4.2.0) by ETSI, page 14) . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19

2.12 Short Term Synthesis Filter . . . . . . . . . . . . . . . . . . . . . . . . . . 192.13 Short Term Synthesis . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 202.14 Network Address Translation . . . . . . . . . . . . . . . . . . . . . . . . . 21

3.1 XUP board Architecture and internal Virtex structure . . . . . . . . . . . 273.2 Ethernet core connections . . . . . . . . . . . . . . . . . . . . . . . . . . . 273.3 Ethernet Frame data format (source: Xilinx CoreLogic PLB MAC con-

troller) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 283.4 AC97 core implemented in the FPGA . . . . . . . . . . . . . . . . . . . . 283.5 AC97 communication signals (source: datasheet LM4550, National Semi-

conductor) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 293.6 AC97 audio frame (source: datasheet LM4550, National Semiconductor) . 293.7 PS/2 core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.8 VGA core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 303.9 Example VGA output . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.10 RS232 core . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 313.11 Address ranges of modules . . . . . . . . . . . . . . . . . . . . . . . . . . . 333.12 GSM hardware decoder module architecture . . . . . . . . . . . . . . . . . 363.13 Lattice filter module architecture . . . . . . . . . . . . . . . . . . . . . . . 383.14 Cross-correlation module architecture . . . . . . . . . . . . . . . . . . . . 393.15 Subframe BRAM contents . . . . . . . . . . . . . . . . . . . . . . . . . . . 393.16 Dual lattice filter module architecture . . . . . . . . . . . . . . . . . . . . 41

4.1 Processor usage . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 444.2 GSM profiling . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44

vii

Page 12: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

4.3 Processor usage with hardware acceleration . . . . . . . . . . . . . . . . . 454.4 GSM profiling with hardware acceleration . . . . . . . . . . . . . . . . . . 46

viii

Page 13: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

List of Tables

2.1 H.323 vs SIP . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 72.2 Codec characteristics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

ix

Page 14: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

x

Page 15: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

Acknowledgements

While I worked on my MSc. project, several people had a positive influence on myprogress and I would like to thank them in general. Some of them I would like to thankin particular. First, I thank my supervisor Stephan Wong. He initiated this projectand gave me the possibility to work on it. He also spend a lot of time reading, editing,commenting and structuring the publication and my thesis report. I thank Bert Meijs formodifying the network for testing and demonstrating my VoIP phone at the TU Delft.I thank Dennis, Annabel and Ria for commenting and editing my thesis and alwaysbelieving that I would finish my study. I thank Bert for creating the perfect ambiancefor working on my thesis in IJzendijke. Finally, I would like to thank my parents, Jokeand Kees, for giving me the opportunity to study at the university and supporting meon sidelines.

M. van den BraakDelft, The NetherlandsJune 14, 2006

xi

Page 16: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

xii

Page 17: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

Introduction 1The DVS Phone1 is the physical result of this Master of Science project, which aims toconstruct a VoIP phone and speeding up the GSM speech codec. This report elaborateson the design and implementation of this VoIP phone. Section 1.1 presents an intro-duction on Voice-over IP. In Section 1.2, a motivation for this M.Sc. project is given.Section 1.3 describes the main requirements for this project. Section 1.4 elaborates onthe project goals and the approach used for achieving these goals. Section 1.5 presentsthe framework of this thesis.

1.1 Introduction

The Internet offers many services, e.g., e-mailing, instant messaging, bank accountingand file-sharing. Another service that is gaining popularity is Voice over IP (VoIP).VoIP enables people to make phone calls, using the Internet for voice transportation.The greatest benefit of VoIP over the POTS (Plain Old Telephone System) is thatexpenses can be lowered incredibly. The reason for this, is that an Internet connectionis increasingly more common and the cost for an Internet connection is decreasing. Inaddition, there is no cost associated with call establishment and there is no per minuterate. These factors make VoIP virtually free. Another advantage of VoIP is when aperson moves, the VoIP phone automatically registers its new location, which makesVoIP transparent from this view.

IP networkPacketizer DepacketizerD/A

Sender Receiver

Playback buffer

IP packets

MicrophoneSpeaker

A/D

Figure 1.1: Voice-over IP system

A typical VoIP system is depicted in Figure 1.1 and an explanation of the partsconstituting this system is given in the following. First, at the sender side voice soundis sampled using a microphone. Second, this voice signal is translated to a digital rep-resentation by a D/A (digital-to-analog) converter. Third, this bitstream is packed intoIP packets and sent along an IP network, which in most cases is the Internet. Fourth, at

1DVS Phone is an abbreviation of Delft VoIP SIP Phone.

1

Page 18: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

2 CHAPTER 1. INTRODUCTION

the receiver side these packets are stripped from IP and other headers. Fifth, these sam-ples are put in a playback buffer. This buffer is needed to compensate for the variationof delay over the network, called jitter. Finally, an A/D (analog-to-digital) converterconverts the voice data back into an analogue signal. Notice that most VoIP systemsare two-way systems, therefore, the system must be replicated reversely.

1.2 Motivation

Since VoIP is an emerging service, the number of hardware VoIP phones in homes andoffices is also increasing. These embedded systems usually contain a general-purposeprocessor, mostly running on lower clock rates. However, more performance will beneeded when new functionalities (e.g., video conferencing, encryption) will be added tothe system in the near future. Increasing clock rates may not always result in a propor-tional increase of performance. Utilization of dedicated hardware is an adequate solution,however, it is not flexible and requires a long design trajectory. Reconfigurable hard-ware does not have these drawbacks. It offers the execution speed of dedicated hardwarewhile maintaining flexibility. Reconfigurable hardware permits a short time-to-marketand is therefore increasingly used for performance enhancements in embedded systems[7][9]. In a previous investigation [29], we have proven that reconfigurable hardware cancontribute in speeding up compute-intensive parts. To this extend, we build a VoIPphone using an embedded environment. Reconfigurable hardware is used to speed upthe compute-intensive parts.

1.3 Project definition

This project entails the design and implementation of a complete VoIP Phone, using theXilinx University Program (XUP) experimentation board. The most compute-intensiveoperations on the VoIP Phone will be accelerated using reconfigurable hardware, risingthe possibility for adding functionalities. The requirements of the VoIP Phone are min-imal, which in this project means: The VoIP Phone should be capable of connecting toother VoIP Phones2 and exchange speech data using the Internet. However, for propercontrol and convenience more functions must be implemented as will become clear inChapter 2.

1.4 Project goals and approach

The main goal of this thesis is to create a complete VoIP phone with minimal re-quirements in hardware. Reconfigurable hardware will be used to accelerate the mostcompute-intensive parts of the software. Using reconfigurable hardware reduces theprocessor usage of the Virtex II Pro. This creates a possibility to add more features tothe DVS Phone in the future, e.g., conferencing, video support, and encryption.

Other projects where reconfigurable hardware and software are co-operating [8][9]have shown that except for the faster execution of an operation in hardware than in

2Only VoIP phones that have implemented the same protocols for signaling, speech codecs, etc.

Page 19: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

1.5. THESIS OVERVIEW 3

software, the communication must also be taken into account. As a result, the speedupof the total application is influenced by the amount of data that is exchanged betweenhardware and software.

To achieve our main goal, the approach will be the following:

• Designing and implementing a VoIP phone in software.

• Profiling of the different operations running on the VoIP phone to determine themost time-consuming operations.

• Porting the most time-consuming operations to reconfigurable hardware.

• Testing, verifying and debugging the application.

1.5 Thesis overview

This thesis is organized as follows. Chapter 2 gives an overview of related work, rele-vant background information and states concrete requirements for the implementation.Chapter 3 describes in detail the implementation of the software as well as the hardware.In addition, the design for hardware acceleration is described. Chapter 4 presents thetiming results for the pure software version and the hardware accelerated version. It alsogives the calculations for the overall speedup. Chapter 5 states the conclusions and givesrecommendations for further research.

In this field of research it is common to use many abbreviations, for a list of usedabbreviations the reader is referred to Appendix C.

Page 20: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

4 CHAPTER 1. INTRODUCTION

Page 21: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

Background 2As stated in Chapter 1, a minimal VoIP implementation requires two functionalities.First, it should be able to connect to other VoIP phones and, second, voice data shouldbe carried by the Internet. The first requirement is fulfilled by using signaling (signalingwill be discussed in Section 2.2). Sending voice data over the Internet can be realized byputting a number of samples (PCM sampling) in an IP packet. However, sending samplesover the Internet consumes a lot of bandwidth, consequently, reducing bitrate usingcompressing codecs is desirable. Moveover, the receiving side is unable to determine whatcodec is used when voice data is sent solely. Additional information on the interpretationof the audio data must be sent along. The Real-Time protocol is provided for thispurpose.

Section 2.1 brings other studies on VoIP and reconfigurable hardware in relationto our project. Section 2.2 describes signaling and in particular the SIP protocol. Sec-tion 2.3 describes the Real-time Transport Protocol, used to carry audio data. Section 2.4describes techniques to reduce bandwidth when sending voice data over a network. Inparticular the implemented GSM 06.10 standard is described. When the DVS Phonewas developed, problems were encountered using the DVS Phone on a LAN. Effort wasneeded to relieve these common NAT problems. LANs, the NAT problem and somesolutions are discussed in Section 2.5. Finally, Section 2.6 presents some functionalitiesthat enhances the use of the DVS Phone. Section 2.7 concludes this chapter.

2.1 Related work

In the mid-nineties, VoIP made its first entrance in the industry [1]. Companies usedVoIP as an alternative for the expensive traditional telephone system. Especially in-ternational phone costs could be reduced drastically. The saving on expenses in thattime compensated for the bad quality of VoIP. When telephone companies had reducedtheir rates, VoIP and POTS started competing. Consequently, the amount of research onVoIP quality has exploded the last decade. VoIP quality mainly concerns call setup delay[11] and perceptive speech quality. The amount of research performed on the former issignificantly less than the amount of the latter. We must mention the study of [17] thatproposes a signaling protocol implementation utilizing reconfigurable hardware. Percep-tive speech quality is influenced by two aspects. The intrinsic quality depends on theused codec, second, network impairments have negative influence on the speech quality.Both are reviewed in the following.

The codec is responsible for the intrinsic speech quality of VoIP. CELP codecs (LinearPredictive codecs are discussed in Section 2.4) achieve most bitrate reduction whereasmaintaining good speech quality. Moreover, these codecs are the most computational

5

Page 22: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

6 CHAPTER 2. BACKGROUND

complex and, therefore, these codecs are suited for implementation utilizing reconfig-urable hardware. In [26] an FPGA implementation of a DSP core is suggested. TheDSP core is a 16 bit fixed processor, that is capable of GSM vocoding1. [20] used re-configurable hardware to reduce power, with respect to a DSP, for the ACS (AlgebraicCodebook Search) which is the most computational intensive part of the Codebookvocoders. [22] presents a complete G.729 vocoder implementation utilizing reconfig-urable hardware.

Many investigations assess perceptive speech quality in terms of network impairments(e.g. jitter, packet loss)[21]. Studies on the reliability of the Internet can be found in[13] that discusses the maximum number of VoIP calls in a wireless network, [4] whichinvestigates the effect of link failures on VoIP quality and general investigations on theQoS of Internet in relation to VoIP [5][12][19]. [2] proposes to utilize overlay networks toimprove the QoS of Internet. In addition, network improvements must be distinguishedfrom the compensation for network impairments in the end-points. End-point researchfocuses on two aspects: playback buffer schemes to compensate for jitter [23][25][24] andPacket Loss Concealment (PLC) to compensate for packet loss. [18] uses reconfigurablehardware at the receiver side to improve echo cancellation and PLC.

Moreover, no research has, to our best knowledge, been performed on VoIP utilizingreconfigurable hardware. Combining these two techniques is a new field of research.

2.2 Signaling

Signaling is the communication needed to find other VoIP phones on the Internet. It isthe control part of making a phone call. This involves communication for registration ofusers, but also for setting up, modifying and breaking down a phone connection.

When communication has to be set up, the location of the VoIP phone to be calledhas to be known. In VoIP, this means that the IP addresses and port numbers have tobe known, because with this information any host on the Internet can be reached. Thesolution to this problem, is keeping a database with all registered users along with theirIP addresses and port numbers. The servers that possess these databases are globallylocatable and play an intermediate role at call setup. This implicates that users have toregister themselves at these servers when connecting to the Internet. An advantage isthat an IP address has (almost) no relation with a geographical location. This meansthat when a person moves, only the database has to be updated with a new IP addressand this person is reachable on his new location. If someone wants to call a contact, theequivalent of a phone number has to be known. In VoIP, this generally is an addressconsisting of the following parts in this order: protocol name, colon, username, @(at),servername. Examples are sip:[email protected] and h323:[email protected] exact URI syntaxes are defined in the signaling protocols. For Alice, this meansthat she has to register herself at the server at wonderland.com, in order to have thepossibility to be called. The server at wonderland.com maintains the location databasefor alice and other users.

Certain protocols are used for signaling and the two mainly used are discussed in the

1Vocoding is a contraction of Voice coding, which includes encoding and decoding

Page 23: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

2.2. SIGNALING 7

next section. Second, the SIP protocol is chosen and this choice is argued. Third, themost important features of the SIP protocol are presented.

2.2.1 Signaling protocols

There are two widely used protocols. They are the SIP (Session Initiation Protocol)protocol, recommended by the IETF (Internet Engineering Task Force) and the H.3232,recommended by the ITU (International Telecommunications Union). Both protocols areused today, however, they are far from compatible. Table 2.1 lists the most importantfeatures of each protocol.

H323 SIP• More call set up time • Simple, Scalable and Extensible• Twelve packets for call-setup • Four packets for call-setup• Provides floor control within a ses-sion3

• Cannot provide

• Requires both TCP and UDP duringthe call-setup

• Basically runs on UDP, but supportsTCP also

Table 2.1: H.323 vs SIP

Two major advantages of SIP are the little number of packets needed for call setupand the fact that is runs on UDP only. The main difference between UDP (User Data-gram Protocol) and TCP (Transmission Control Protocol) is that TCP is a reliable wayto send data, because the data is acknowledged, ordered and resend if not correctlyreceived. UDP just sends data packets no matter if the packets arrive or not, with aminimum of protocol overhead. Besides from signaling, UDP can in this project also beused for the transmission of the voice data. For voice data, low rates of packet loss are ac-ceptable. However, sound quality deteriorates when packets are destroyed on transport.In VoIP systems, delay is also a critical factor. When using TCP, a packet is resendif it is not received correctly. This retransmitting consumes much time and increasesdelay incredibly, because much overhead is needed. Since UDP can be used for signalingas well as voice transport, SIP is chosen as signaling protocol. The next sections willelaborate on the SIP protocol.

2.2.2 SIP protocol

The latest SIP protocol is defined in RFC 3261[15], though, the obsolete RFC 2543[16]must still be supported. The DVS Phone is RFC 2543 compliant, however, future versionsmust update to the newer SIP standard. This includes the implementation of TCP,since SIP packets that are larger than the MTU can not be handled by UDP. The SIPprotocol resembles other IETF protocols, e.g., HTTP and SMTP. Each message is ahuman readable text message which makes debugging easier. There are four types of

2The H.323 standard is a set of protocols, including H.225, H.245, H.450, and RAS.3Floor control provides a way for conferences to control who can input to the media. An example is

that a moderator of a conference is able to let one person speak at the time.

Page 24: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

8 CHAPTER 2. BACKGROUND

SIP network elements: User Agent, Proxy server, Redirect server, Registrar server. Allelements are explained below.

• User Agent. A User Agent physically shows as a software application or hardwareSIP phone. A User Agent Client (UAC) must be distinguished from a User AgentServer (UAS). A UAC generates request messages and sends it to a UAS. The UASgenerates responses to that request messages and sends response messages back tothe UAC. Below, the request and response messages are described. A User Agentfulfills both tasks of a UAC and UAS.

• Proxy server. A Proxy server is capable of forwarding messages and, if needed,modify them. On behalf of the initiator, the proxy server can initiate new requests,interpret replies and send an unambiguous reply back to the initiator. It relievesa UA from a big search when finding a contact, because a Proxy server can forka request (i.e. generating requests to more than one other network element). AProxy server is also a good way of implementing policy rules.

• Redirect server. A Redirection server is a server that responds to an INVITE (callsetup initiation, see below) message. If the person that is to be found temporarywants to be contacted at another SIP address, this new address is passed to theinitiator of the INVITE message. The difference between a proxy server and aredirect server is that a proxy server directly forwards a message by initiatinga new request. A redirect server only answers with a response message to theinitiator, which as at his turn, must reformulate a new request.

• Registrar server. A Registrar server is a server that accepts REGISTER requests.A User Agent sends a REGISTER request at startup with its contact information (IPaddress and port number the User Agent can be reached). The Registrar serverstores the contact information in its database. When a registered user at theRegistrar server is INVITEd the contact information is used to set up a connectionbetween two User Agents. The procedure for REGISTERing at this server is depictedin Figure 2.3(a).

In the SIP protocol, there are two types of messages: requests and responses. Thefirst line in a message determines whether it is a request or a response. The rest of themessage has the same structure, that is a header and a body. SIP only operates in theheader of a message. A SIP message may have a body, which is separated from theheader by one blank line (CR/LF). The SIP request and response are discussed in thefollowing.

If the first word in a message is a command in capital, the message is a request and thecommand is called the method. This first line should end with SIP/2.0. An example ofa request is depicted in Figure 2.1, where the request method is INVITE. Other examplesare REGISTER, CANCEL, ACK, BYE. The INVITE method is accompanied by the URI ofthe person that is invited.

If the first line starts with SIP/2.0 it is a response message. The next parameter isa three digit number which is a numerical code for the response, followed by a humanreadable explanation of the response. An example of a response is depicted in Figure 2.2.

Page 25: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

2.2. SIGNALING 9

INVITE sip:[email protected] SIP/2.0Via: SIP/2.0/UDP 192.168.0.173:5060;branch=z9hG4bK-sE8jVo237HrkbYd4From: <sip:[email protected]>;tag=MYTAG0123To: <sip:[email protected]>Call-ID: [email protected]: 3 INVITEContact: <sip:[email protected]:62684>Content-length: 188Content-type: application/sdpMax-Forwards: 70

v=0o=- 100022873 100022914 IN IP4 145.94.34.137s=Michels MSc projectc=IN IP4 145.94.34.137t=0 0m=audio 1600 RTP/AVP 3a=rtpmap:3 GSM/8000a=sendrecva=direction: active

Figure 2.1: SIP request example

SIP/2.0 200 OKVia: SIP/2.0/UDP192.168.0.173:5060;received=145.94.34.137;branch=z9hG4bK-sE8jVo237HrkbYd4;rport=62797From: <sip:[email protected]>;tag=MYTAG0123To: <sip:[email protected]>;tag=as271c7f01Call-ID: [email protected]: 3 INVITEMax-Forwards: 70Contact: <sip:[email protected]:5065>Content-Type: application/sdpContent-Length: 334

v=0o=root 14256 14256 IN IP4 82.173.230.75s=sessionc=IN IP4 82.173.230.75t=0 0m=audio 42256 RTP/AVP 18 97 3 0 101a=rtpmap:18 G729/8000a=fmtp:18 annexb=noa=rtpmap:97 iLBC/8000a=rtpmap:3 GSM/8000a=rtpmap:0 PCMU/8000

Figure 2.2: SIP response example

The most important SIP header fields are discussed below.

• Via Each time a message is forwarded by a proxy server, a Via field is added. AVia field contains an IP address and port number at which the sender/forwarderwants to receive the response. The Via field makes sure the reply follows exactlythe same path as the request.

• To This is the logical destination of the SIP message.

• From This is the logical source of the SIP message. If the source wants to re-main anonymous, the display name ”Anonymous” should be used along with asyntactically, but unusable URI, for example: sip:[email protected].

• Contact Contains a URI, which is mostly an URL, at which the original sendercan be reached. This can be used for further SIP communication.

Page 26: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

10 CHAPTER 2. BACKGROUND

• CSeq This field is used to order transactions. It contains a integer number and themethod where the message is related to. The number is incremented on every newgenerated request.

• Call-ID Transactions can be bundled using the Call-ID field. When registeringa Call-ID is generated. When registering with authentication (see below), thetwo subsequent register messages should have the same Call-ID. Another exampleis bundling transactions which corresponds to the same phone call. The INVITEmessage, the BYE message and all other messages related to the phone call, havethe same Call-ID.

• Content-Length This field contains the length of the body in octets. If there isno body content, this field should be 0.

• Content-Type Only if the message contains a content, this header field is present.Values for this field are MIME strings4. Nearly always the content is SDP (SessionDescription Protocol), which has MIME-code application/sdp.

2.2.3 SIP Authentication

Not every SIP server accepts any REGISTER request, because it is a service which mayhave to be paid for. Only the users that are known to the system, will be able toREGISTER, however their identity has to be verified by means of authentication usinga username and password. When using authentication, a password needs to be send tothe SIP server. The password is not encrypted, but hashed, and therefore easy to crack.To overcome this problem, a secure data transfer has to be set up. The DVS Phone doesnot support this functionality. The authentication method is same as for HTTP when arestricted webpage is requested and is described in RFC 2617. The hashing procedure isas follows. There is a hashing function H(x) → y, which calculates from any text inputof any size a 128-bit number which is converted to a 32-character hex string output. Atthe moment the MD5 hashing method is most common, though other hashing functionscan be used according to RFC 2617. Therefore, the DVS Phone uses the MD5 functionto hash the password when authorizing. For SIP registering with authentication fourmessages have to be sent, as depicted in Figure 2.3(b). The first is a REGISTER requestin a normal way. The answer of the SIP server will be a 401 Unauthorized message,which contains a WWW-Authenticate field. This field contains several parameters, i.e.,realm and nonce. These values are used to hash the password in a way that is describedbelow. The hashing result is returned in a new REGISTER request, but this time theAuthorization field is added and the response parameter holds the hashing result (allother parameters used for hashing except the password are sent along). The SIP serverwill check the response and if the whole process is correct, the SIP server responds witha 200 OK response and the user agent is registered at the SIP server. The response isproduced from the following algorithm, which is called the digest operation:

4MIME (Multi-purpose Internet Mail Extensions) are text strings invented originally to indicate thetype of email (SMTP) parts, however in HTTP and SIP they are also introduced to denote the body type.A list of registered MIME types is maintained by the IANA (Internet Assigned Numbers Authority)

Page 27: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

2.2. SIGNALING 11

SIP client SIP serverREGISTER

200 OK

(a) without Authentication (b) with Authentication

Figure 2.3: SIP REGISTER request sequences

The first step is hashing a string which is constructed from the username, therealm and the user password. The realm is given by the SIP server along with the401 Unauthorized message. A1 is the hashing result of this string. A1 contains 32hexadecimal characters with the letters in lower case.

A1 = H(username ":" realm ":" password)

The second step is hashing a string which is constructed from the method and a URI.If the SIP server did not pass a URI value in the 401 Unauthorized message, the URIis equal to the realm, with ”sip:” in front. A2 is the hexadecimal hashing result andcontains 32 characters.

A2 = H(method ":" uri)

In the last step, the response is constructed from A1 and A2 and a value called nonce.This value is also given by the SIP server in the 401 Unauthorized message and is arandom value generated by the SIP server every time a REGISTER attempt is made.

response = H(A1 ":" nonce ":" A2)

The response is placed in the Authorization field as a parameter, along with otherinformation used for the digest operation. An example of the resulting Authorizationfield can be:

Authorization: Digest username="17476383971",realm="proxy01.sipphone.com",

nonce="43f30d1a8a774c918aa2843c60cf13a312fb5c47",uri="sip:proxy01.sipphone.com",

response="290f15374e97c994a8399d56882cf7dd",algorithm=MD5

2.2.4 Setting up a SIP connection: Inviting a friend

When a User Agent is registered at a SIP Registrar server, its URI is known at thatSIP server. The User Agent can be found by any other SIP User Agent using this URI.Figure 2.4 depicts the communication between two User Agents that want to set up aconnection. We will call the User Agent that initiates the call-setup, the inviter. TheUser Agent that is being called, is the invitee. The initial communication is sent via aSIP Proxy server, because the inviter does not know the URI of the invitee. The proxyserver does and once the OK message is received by the inviter, the URI of the inviteeis known, because the OK message contains the invitee’s URI in the Contact header

Page 28: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

12 CHAPTER 2. BACKGROUND

field. Consequently, the ACK request (which by exception does not generate a responsemessage) is directly sent to the invitee. The ACK request includes a Contact header fieldalso. As a result, the inviter and invitee both have each other’s URI’s making direct SIPcommunication possible.

SIP UAC SIP serverINVITE

100 Trying

200 OK

180 Ringing

SIP UAS

INVITE

180 Ringing

200 OK

ACK

Media Session

Figure 2.4: SIP INVITE sequence

During call-setup, negotiation is performed for agreement upon the used codec. Atthis point SDP (Session Description Protocol), as defined in RFC 2327 [14], plays animportant role. The SDP operates in the bodies of the SIP messages. This is fullycompliant with the SIP standard, that just specifies that ”SIP messages MAY containa body”. As depicted in Figure 2.1, the INVITE message contains a body of type SDP.The type of the body is specified in the header with the Content-Type field. The textapplication/sdp indicates that the body actually contains information on the sessionnegotiation (and of course if the Content-Length header field is greater than 0). TheINVITE message carries the supported codecs from the INVITor’s side. The invited sideadds in the 200 OK respond message its supported codecs. Both list all supported codecsin the order of preference. By default the first listed codec should be used. In real, allVoIP implementations support the G.711 (see Section 2.4) codec. As a result, there isalways a codec both User Agents support.

Once the SIP connection has to be torn down, the BYE procedure depicted in Fig-ure 2.5 is executed. Because both user agents have received their URI (in the Contactheader field) at SIP connection setup, each of the user agents is able to directly senda SIP BYE request to the other user agent. The user agent server in this case respondswith a 200 OK message. This has ended the communication.

Figure 2.5: SIP BYE sequence

Page 29: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

2.3. RTP: REAL-TIME TRANSPORT PROTOCOL 13

2.3 RTP: Real-time Transport Protocol

The Real-time Transport Protocol (RTP) is defined in RFC 3550 [27] and is a simpleprotocol to carry real-time data. It also defines a protocol, RTCP (RTP Control Proto-col), that gives feedback to other RTP session members on the QoS parameters of theconnection. The RTP protocol is independent of the transport protocol and networkprotocol it runs on. Normally, RTP runs on top of UDP/IP which in this project is usedalso. The RTP protocol will be described first and the RTCP protocol will be describedsecond in the following.

0 1 2 3

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

|V=2|P|X| CC |M| PT | sequence number |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| timestamp |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

| synchronization source (SSRC) identifier |

+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Figure 2.6: RTP header (source: RFC 3550 by IETF)

The RTP header is defined according to Figure 2.6. The most important fields arethe PT (Payload Type) field, sequence number and timestamp. Only these fields areimplemented to have some meaningful value. They are briefly described in the following.The Payload Type is a number that indicates how the payload (transported real-timedata) must be interpreted. A list of standard payload types is available in RFC 3551.From this list only 0: PCMU (µLaw companding) and 3: GSM are implemented inthe DVS Phone. The sequence number is used to detect the sending order of the RTPpackets. Packets may arrive out-of-order, since sending RTP packets over UDP/IP doesnot guarantee an in-order arrival. When detecting an out-of-order receipt of packets,the packets can be re-ordered according to the sequence number. The sequence numberis an initially random number, that is increased by one every next RTP packet. Thetimestamp field is used to determine at what time a packet is recorded. The timestampvalue has no unit, however, the resolution of this counter should be high enough to havecertainty when a packet has to be played back. In real-time audio, this counter usually isincremented every recorded sample. This means that if a RTP packet contains data for160 audio samples (which does not say anything about the size in bytes of the payload),the counter is increased 160 every next RTP packet.

One remark must be made for the re-ordering of packets according to the sequencenumber. The DVS Phone does not have packet re-ordering implemented. Once a packetarrives before another packet that was send earlier, the latter packet is discarded.

The RTCP protocol is used to control and inform other participants in a RTP ses-sion. The DVS Phone only has implemented the Sender Report (besides the ReceiverReport), that informs other participants about the receive statistics, i.e., for instance,last received RTP timestamp, cumulative packet loss, highest sequence number received,and interarrival jitter. The DVS Phone sends a Sender Report every 5 seconds, when

Page 30: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

14 CHAPTER 2. BACKGROUND

in the Talking state. When the DVS Phone received a Sender Report, the cumulativepacket loss and the interarrival jitter values are displayed on the VGA screen to informthe user.

2.4 Voice compression

In communications involving voice, a sample rate of 8 kHz is adequate to achieve clearand understandable speech. Utilizing 16 bits per sample (PCM sampling), the dataratecan be calculated as follows: 8.103 × 16 = 128 kb/s. This bitrate is a fraction of theaverage broadband Internet connection most ISP’s offer. However, the use of VoIP willincrease in the future, as a result, the total amount of data transported over the Internetalso increases. Therefore, reducing bandwidth using compressing codecs has advantageas more conversations can be made along the same line. Codecs are specifications ofhow audio data must be interpreted and how the data can be transformed to a PCMsignal. Most codecs are algorithms, used to reduce the bitrate of speech data incredibly,by giving in on the voice quality. Since most codecs are compressing, we use the wordcodec for voice compression techniques. Some of these codecs are listed in Table 2.2[6].For every codec, the kind of algorithm is stated, the framesize and lookahead (if neededfor frame-based codecs), the bitrate and the MOS rate. The MOS (Mean OpinionScore)[28] rate is a common measurement for the quality of a codec5. The G.xxx suiteare recommendations of the ITU (International Telecommunications Union), and theGSM xx.xx standard is promoted by the ETSI (European Telecommunications StandardsInstitute).

Codec Algorithm Frame size/lookahead (ms) Bitrate (kb/s) MOS rateG.711 PCM 0.125 64 4.1G.723.1 ACELP 5.3 3.7G.726 ADPCM 0.125 32 3.9G.728 LD-ACELP 3-5/0 16 3.9G.729 CS-ACELP 10/5 8 3.9GSM 06.10 RPE-LTP 20/0 13.2 3.5

Table 2.2: Codec characteristics

The G.711, G.723, G.729, and GSM 06.10 codecs are briefly explained below. G.711is the least complex codec. It converts any 16-bit sample to a 8-bit value via non-linearquantization. Normally, quantization levels are equally distributed, however for softsignals this results in granular noise. For these soft signals, the quantization levels aredense and for louder signals, the quantization levels are sparse. As a result, the SNR(Signal to Noise Ratio) for soft signals is increased, whereas the SNR for loud signals isslightly decreased. This process is called companding; the signal is compressed beforetransmitting and expanded after receiving. Two different distributions of non-linearquantization levels exist, which are called aLaw and µLaw companding. The G.711

5The MOS rate is a number from 1 to 5, where 1=bad, 2=poor, 3=fair, 4=good, 5=excellent. TheMOS rate is determined by a test group that performs listening tests.

Page 31: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

2.4. VOICE COMPRESSION 15

codec with µLaw companding is implemented in the DVS Phone, because it is a codecto fallback on when the inviter and invitee do not have supported codecs in common.Therefore, every known VoIP implementation supports the G.711 (aLaw and µLaw)codec. The G.726 is an ADPCM (Adaptive Differential Pulse Code Modulation) codec,that uses the difference between samples. These differences are used to predict the nextsample value. The error values of this prediction are sent to the decoder. The G.723,G.728 and G.729 are CELP (Codebook Exited Linear Prediction) codecs and belongto the family of Linear Prediction codecs. Codebooks (mostly a fixed and an adaptivecodebook) are used for excitation of a linear predictive filter. The GSM 06.106 codec isa RPE-LTP (Regular Pulse Exited Long Term Prediction) Linear Prediction Coder andalso belongs to the family of Linear Prediction codecs. In contrast to the CELP codecs,the GSM 06.10 uses regular pulses for excitation of a linear predictive filter. The CELPand GSM codecs have proven that using Linear Prediction technique, incredible bitratereductions can be achieved, while maintaining good speech quality.

The DVS Phone uses the GSM 06.10 for voice compression. Section 2.4.1 will explainwhy the GSM codec is chosen. Section 2.4.2 and Section 2.4.3 explain how speechcompression in GSM works.

2.4.1 Codec choice

The GSM codec is used in cellular phones all over the world. The codec can be freelyused, because it is not patented. However, the better (lower bitrate vs. higher MOSrate) G.723, G.728 and G.729 algorithms are patented by the ITU, as a result, thespecifications of the algorithm and the implementation require payment. The GSM codechas been chosen for implementation due to its widely usage and good documentation.Moreover, the algorithm is free available from the ETSI [10]. For these reasons, theGSM codec is chosen for implementation. The used GSM source code is from [3] fromthe Technical University of Berlin. In addition, the G.729 and the GSM codec have somecommon calculations, since they both are Linear Prediction codecs. They only differ inthe way the Linear Prediction Filters (Short Term Filter as in the next section will beexplained) are exited. The G.729 (CELP codec) filter is exited by codebook pulses,and the GSM (RPE-LPC codec) filter is exited by regular pulses. Therefore, some ofthe results of this project, even apply to the G.729 codec. The next two sections willdescribe the GSM encoder and decoder respectively.

2.4.2 GSM encoder

A block scheme of the encoder algorithm is depicted in Figure 2.7. The fragment ofvoice depicted in Figure 2.9(a) is used as an example. The samples -159..0 representthe previous frame that is already encoded and the samples 1..160 represent the currentframe to be encoded. Although the frames are non-overlapping, data from the previousframe is used to encode the current frame. The steps for encoding are described below.The blue numbers between brackets correspond to the blocks in Figure 2.7.

6The GSM 06.10 is also known as GSM FR (Full Rate). Other common used GSM codec standardsare GSM 06.20 (Half Rate), GSM 06.60 (Enhanced Full Rate), GSM 06.90 (Adaptive Multi-Rate).

Page 32: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

16 CHAPTER 2. BACKGROUND

Pre-processing

Short term analysis

filter

RPE grid selection

and coding

Short term LPC

analisys

Long term analysis

filter

LTP analysis

RPE grid decoding and

positioning

8 Reflection coefficients coded as Log Area Ratios

(36 bits/20 ms)

60 RPE parameters (47 bits/5 ms)

8 LTP parameters (9 bits/5 ms)

Input signal

-

d: Short term residuale: Long term residual (40 samples)dpp: Short term residual estimate (40 samples)dp: Reconstructed short term residual (40 samples)x: Quantized long term residual (40 samples)

d

x

dpp

e1 3

2

4

5

6

7dp

The numbers in blue give the order of execution of the algorithm

LARc0..7

N0..3b0..3

M0..3Xmax0..3x0..3[0..12]

Output parameters

soundcoding parameters

Figure 2.7: GSM Encoder (source: GSM: Digital Cellular Telecommications Sys-tem(Phase 2); Full Rate Speech; part 2: Transcoding (GSM 06.10 version 4.2.0) byETSI, page 13)

The first stage (1) in the encoding process is a preprocess operation that includesan offset cancellation and a pre-emphasis filter7. The second stage (2) is calculating theLinear Prediction Coefficients (LPC), which are sometimes called reflection coefficientsafter the vocal tract model. The coefficients are destined for the Short Term AnalysisFilter (3) which is a lattice filter and is depicted in Figure 2.8. The filter coefficients areobtained using the autocorrelation function and, subsequently, the Schur recursion algo-rithm. These coefficients are quantized and coded in parameters LARc0..LARc7 (LogArea Ratio coefficients) and subsequently decoded, because these decoded coefficientswill be used in the decoder to reconstruct the original signal eventually. Therefore, theoriginal signal needs to be filtered with the reconstructed coefficients.

inputs[k]

d[k]d0

u0

d1

u1

d2

u2

d3

u3

d4

u4

d5

u5

d6

u6

d7

u7T T T T T T T T

r0

r0

r1

r1

r2

r2

r3

r3

r4

r4

r5

r5

r6

r6

r71 2 3 4 5 6 7 8

92 3 4 5 6 7 8

3 4 5 6 7 8 9

2 3 4 5 6 7 8

Figure 2.8: Short Term Analysis Filter

Typically, the main frequencies of the voice frame are suppressed by the Short TermAnalysis Filter. Figure 2.9(b) may clarify this operation, depicting the spectrum of theoriginal voice signal and the characteristic of the Short Term Analysis Filter. After theoriginal signal is filtered by the Short Term Analysis filter, the Short Term Residual signal

71st order low-cut filter with a cutoff frequency of 100 Hz

Page 33: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

2.4. VOICE COMPRESSION 17

(d) is left. Although the Short Term Analysis Filter has suppressed the frequencies thatconstitute different vowels, the Short Term Residual signal still maintains the individualspecific sound. Discarding this signal would produce an understandable, however roboticspeech sound at the decoder.

−150 −100 −50 0 50 100 150−1

−0.5

0

0.5

1x 10

4

−150 −100 −50 0 50 100 150−1

−0.5

0

0.5

1x 10

4

samples

(a) Original (above) and Short Term Residualsignal (below)

0 500 1000 1500 2000 2500 3000 3500 40000

1

2

3

4

5

6

7

8

9

10

Hz

Original voiceShort Term Analysis FilterShort Term Residual

(b) Spectrum of original signal, Short TermResidual and Short Term Analysis Filter

Figure 2.9: Short Term Analysis

The original signal and Short Term Residual signal are depicted in the time domainin Figure 2.9(a). To avoid distortion caused by abrupt changes in the filter coefficients,the first 40 samples of a frame are used to gradually change the filter coefficients fromthe values of the previous frame to the values of the current frame. After Short TermFiltering, the Short Term Residual signal (d) is still 160 samples and is split up in fourequal subframes of 40 samples. Four subframes of residual samples are fed to the LongTerm Prediction (LTP) analysis and Regular Pulse Excitation (RPE) encoder to obtainLTP and RPE parameters for each subframe.

Determining the LTP parameters is described in the following(4). The Short TermResidual is still periodic, as depicted in Figure 2.9(a). This periodicity is exploited bydetermining the lag parameter, that orders the decoder to copy a part of the alreadyreconstructed Short Term Residual signal. The lag parameter is obtained using the crosscorrelation function between the Short Term Residual signal (d) and the reconstructedShort Term Residual signal (dp), that can mathematically be expressed as:

c[k] =39∑i=0

d[i]dp[i + k], k = −120..− 40 (2.1)

At the maximum of this cross correlation function, both signals correspond most.This point is stored in the lag parameter. Figure 2.10(a) depicts this searching, wherethe lag is -79. This corresponds to the maximum of the cross correlation function whichis denoted by a circle in Figure 2.10(b). This lag and the gain (amplitude factor betweenoriginal and lag) are coded into the parameters Nj and bj where j denotes the currentsubframe and is in the range [0..3].

Page 34: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

18 CHAPTER 2. BACKGROUND

−120 −100 −80 −60 −40 −20 0 20 40−4000

−3000

−2000

−1000

0

1000

2000

3000

samples

Reconstructed Short Term ResidualShort Term Residual

lag=79

(a) Residual and reconstructed residual

−120 −110 −100 −90 −80 −70 −60 −50 −40−1

−0.5

0

0.5

1

1.5x 10

7

lag

(b) Cross correlation residual and reconstructedresidual

Figure 2.10: Short Term Analysis

The part dp[lag..lag + 39] is copied and attenuated (using the reconstructed gainvalue) by the Long Term Analysis filter (5) to signal dpp which is called the Long TermEstimate. This estimate is subtracted from the Short Term Residual (d), which leavesan error signal (e) between the estimate and the residual. The error signal is small forgraduate changes in d, however, for abrupt changes the error signal is large.

The signal (e) is fed to the RPE grid selection and encoding (6), that filters the signalwith a 11th order perceptual FIR weighting filter 8. This filtered signal is downsampledby a factor 3, which leaves four choices for sample numbers (grids), they are 0..36,1..37, 2..38, and 3..39, all with stepsize 3. The grid with the greatest power is chosenand coded in the Mj parameter. The chosen grid is coded with Adaptive Pulse CodeModulation (APCM), that is, determining the maximum sample value in the RPE gridand normalizing the RPE samples. The maximum of the grid is stored in the xmaxj

and the relative sample values are stored in xj [0..12].

The last step for the encoder after all parameters are obtained, is to reconstruct theShort Term Residual at the PRE grid decoding and positioning (7). The grid is decodedusing xmaxj and xj [0..12], upsampled by a factor 3 by inserting zeros and positioned byMj . The created signal x is a reconstruction of the error signal between the short termresidual (d) and the long term estimate (dpp). Signal dpp is added to the reconstructederror signal to create the reconstructed Short Term Residual signal dp, used for encodingthe next subframe.

The output of the encoder for a 160-sample frame are 8 LPC coefficients, and foreach 40-sample subframe an LTP Lag, LTP Gain, grid position, grid maximum, and 13RPE pulses. Altogether, there are 76 parameters to send, which are packed in 33 bytes.

8This is a low-pass filter with a cutoff frequency of 1500 Hz

Page 35: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

2.4. VOICE COMPRESSION 19

2.4.3 GSM decoder

Because the encoder is designed to relate signals to the decoded signal, the encoder con-tains parts of the decoder. Decoding a GSM frame consists of two parts: reconstructingthe Short Term Residual signal (dp) and filtering this residual signal using the ShortTerm Synthesis Filter. A block scheme of the decoder is depicted in Figure 2.11. Theblue numbers, again, correspond to the numbers in Figure 2.11.

RPE grid decoding and

positioningLong term synthesis

filter

Short term synthesis

filter

Post processing

8 Reflection coefficients coded as Log-Area Ratios (36 bits/20 ms)

60 PRE parameters (47 bits/5 ms)

8 LTP parameters (9 bits/5 ms)

Output signal

LARc0..7

N0..3b0..3

M0..3Xmax0..3

x0..3[0..12]

1 3

2

4

Input parameters

The numbers in blue give the order of execution of the algorithm

soundcoding parameters

Figure 2.11: GSM Decoder (source: GSM: Digital Cellular Telecommications Sys-tem(Phase 2); Full Rate Speech; part 2: Transcoding (GSM 06.10 version 4.2.0) byETSI, page 14)

The RPE grid decoding and positioning (1) and the Long Term Synthesis Filter (2)are also present in the encoder and reconstruct the Short Term Residual signal (dp) equalto the encoder. The third step is filtering (3) the Short Term Residual signal (dp) usingthe Short Term Synthesis Filter. The Short Term Synthesis Filter is the inversed ShortTerm Analysis Filter, consequently the filter is called an inverse lattice filter. The filtergeometry is depicted in Figure 2.12.

outputsr[k]

d[k]d0

v0

d1

v1

d2

v2

d3

v3

d4

v4

d5

v5

d6

v6

d7

v7

- - - - - - - -

r0

r0

r1

r1

r2

r2

r3

r3

r4

r4

r5

r5

r6

r6

r7

T T T T T T T T

128 7 6 5 4 3

10 456789

12345678

9 345678

Figure 2.12: Short Term Synthesis Filter

The filter coefficients, coded in parameters LARc0..LARc7 are used to restore themain frequencies in the speech frame, to reconstruct understandable speech. Figure 2.13depicts the inverse filtering. Figure 2.13(a) depicts the reconstructed Short Term Resid-ual and the reconstructed voice signal in the time domain. Figure 2.13(b) depicts thesesignals, plus the Short Term Synthesis Filter characteristic in the frequency domain.

Page 36: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

20 CHAPTER 2. BACKGROUND

These reconstructed signals can be compared with the original signals in Figure 2.10.

−150 −100 −50 0 50 100 150−1

−0.5

0

0.5

1x 10

4

−150 −100 −50 0 50 100 150−1

−0.5

0

0.5

1x 10

4

samples

(a) Reconstructed Short Term Residual (above)and reconstructed voice (below)

0 500 1000 1500 2000 2500 3000 3500 40000

1

2

3

4

5

6

7

8

9

10

Hz

Reconstructed Short Term ResidualShort Term Synthesis FilterReconstructed Voice

(b) Spectrum of reconstructed residual signal,reconstructed voice and Short Term SynthesisFilter

Figure 2.13: Short Term Synthesis

Last step is postprocessing (4) the signal, which uses the inverse filter9 of the pre-processing filter. The speech signal is now reconstructed.

2.5 Local networks

A Local Area Network (LAN) exists of a computer network varying from tens to hundredsof computers and geographically the distances between the computers are within hundredmeters. The LAN can be connected to the global Internet or Wide Area Network (WAN)Local network have two main advantages. First, Internet traffic can be controlled, andsecond, a whole network of computers can be connected to the Internet using one Internetconnection.

2.5.1 NAT: Network Address Translation

When computers, connected in a LAN, want to access a host on the WAN, a typicalprocedure is performed by the router in order to distinguish traffic of the different hostsin the LAN. The IP packet to be send has a few attributes, e.g., source IP address anddestination IP address. TCP and UDP bring some extra attributes, e.g., source portnumber and destination port number. Port numbers are used to separate traffic fordifferent services. Some services run on standard port numbers. The HTTP protocolworks on port 80, Telnet works on port 23, FTP works on port 21 and SIP works onport 5060. Another advantage of port numbers is that it can separate traffic for differenthosts in a LAN, with little administration needed. The router chooses a random, unusedport number for any service a LAN host will use. An example of NAT is depicted in

91st order low-boost filter with a cutoff frequency of 100 Hz

Page 37: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

2.5. LOCAL NETWORKS 21

192.168.0.2 192.168.0.3 192.168.0.4

Router

62.13.47.8

192.168.0.1

80.63.129.3

5060

2369

4749

5060

To: 62.13.47.8:5060From: 192.168.0.4:4749

To: 62.13.47.8:5060From: 80.63.129.3:2369

To: 80.63.129.3:2369From: 62.13.47.8:5060

To: 192.168.0.4:4749From: 62.13.47.8:5060

IP port IP port192.168.0.4 4749 62.13.47.8 5060 2369

LAN WAN router public port

WANLAN

SIP Server

84.27.147.283

NAT Table

Figure 2.14: Network Address Translation

Figure 2.14. When a request is from a LAN host (192.168.0.4), the source port number(was 4749) is changed (to 2369) and the source IP address (192.168.0.4) is changed to thepublic IP address (80.63.129.3) of the router. This packet is send along the Internet tothe destination host (62.13.47.8). Once the host has processed the packet and generateda reply packet, the router receives this packet with source and destination IP addressesand port numbers exchanged (source: 62.13.47.8:5060; destination: 80.63.129.3:2369).With the destination port number, the router decides to which LAN host the reply hasto be send and on what port number (destination is changed to: 192.168.0.4:4749). Thisreplacing of addresses is called Network Address Translation (NAT).

However, it is impossible to make a table to map any possible local port number to apublic port number. Every time a LAN host makes a request to a host outside the LANnetwork, an entry is created to the so-called NAT table. An entry exists of the LAN IPaddress and port, WAN IP address and port which maps onto a public port number ofthe router. If the router receives a packet on this port, and the source is the WAN IPand port, it forwards the packet to the LAN host.

NAT is a good solution to connect a whole network to the Internet, using a singleInternet connection with one public IP address, however, NAT has some problems whichneed extra effort to overcome.

2.5.2 The NAT problem

A host in the LAN, does not ”see” the presence of the router. It sends an IP packet witha certain destination address to the Internet gateway, with its own address as sourceaddress. The response packet has source address and destination address exchanged, asif the response packet was directly addressed to the host in the LAN. The NAT is not

Page 38: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

22 CHAPTER 2. BACKGROUND

visible for any host in the LAN. This can be a problem if some host on the Internet (witha public IP address) wants to access a host in a LAN. When the host in the LAN did notsend any packet before, there is no entry in the NAT table and the router does not knowto which host it has to send the packet. The packet is returned to the sender with theICMP message ”Host unreachable”. This is a typical problem with LAN. Additionally, ifthe sender has send packets before and an entry exists, a host in the LAN does not knowits public IP address and its port mapping. In the case of SIP this is needed, becausethe Contact header field needs a URI with the public IP address and port number, inorder to be found on the public Internet.

There exit four types of NAT: symmetrical, asymmetrical, full cone, half cone. Thereis another problem with the symmetrical NAT. This type of NAT is used because is hasgood security properties (other types of NAT are vulnerable to port scan attacks). Inthis type of NAT, the public source port chosen by the router does not only depend onsource IP and port of the LAN host, but also on the destination IP and port (this iswhy it is called a symmetrical NAT). This means that once a port is opened by the LANhost, it is able to send packets to the WAN host. The WAN host is also able to sendpackets back to the LAN host. However, if another WAN host (in Figure 2.14 server at84.27.147.283) wants to access the same LAN host with the same port number (2369)at the router, the packet is blocked, because the opened port by the LAN host is onlyvalid for that particular WAN host. Other WAN hosts can not use the opened port.

One solution to this problem is to let a network administrator add entries to the NATtable by hand. Packets received on a certain port are forwarded to a certain host andport in the LAN. However, in larger networks this can be a huge job. There are othersolutions like TURN (Traversal Using Relay NAT), ALG (Application Layer Gateway),tunneling, and STUN (Simple Traversal of UDP over NAT). The first three are brieflydescribed.

• TURN uses a server outside the LAN, which accepts and forwards all traffic froma host inside the LAN. A request to the TURN server is needed to discover thepublic IP address and port that will be used for the traffic. Since all traffic is sendto the same host in the WAN, there are entries in the NAT table and receivedtraffic by the TURN server can be forwarded to the host in the LAN.

• An ALG gateway is a smart gateway, which scans the data that flows in and out.If for example a SIP message is passed, it secretly replaces the IP and port numbersgiven with its own public IP and reserved port number in the Contact field andin the SDP when a SIP connection is setting up. A disadvantage is that updatesmust be made to the gateway.

• Tunneling is a technique which needs one server in the LAN and one server in theWAN. These servers create a pipe through the firewall. The server in the WANreceives all messages destined for the host in the LAN. The server in the WANforwards all received messages to the connected host in the LAN and forwards allmessages received from the host in the LAN to the destination in the WAN. Thissolution may add delay to the connection, but above that, it is a breach in thesecurity of the firewall.

Page 39: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

2.6. ADDITIONAL FUNCTIONALITIES 23

The last solution, which is called STUN, is described in the next section.

2.5.3 STUN: Simple Traversal of UDP over NAT

STUN is a very simple protocol to discover the public IP address and port mapping. Itrequires a STUN server on the public Internet. For each port map one request is neededand one response is received. For example, the public port number for the internal port5060 (used for SIP) must be discovered. First, a request on port 5060 is send to theSTUN server. The STUN server is capable of determining from with public IP addressand port number the request came. This information is packet in the response packetand send back to the questioner. The public IP address is now know to the host in aLAN and it can advertise this information in order to be found from the public Internetin the LAN. This protocol is used by the DVS Phone before it registers at the SIP server.It discovers its public IP address and port number for SIP (port 5060). When an mediaconnection is to be set up using SIP, the public port numbers for the RTP and RTCPneed to be known. This requires another two requests.

2.6 Additional functionalities

Some functionalities were added to the DVS Phone, i.e., DHCP, DNS and NTP. Thesefunctionalities do not belong to the minimal requirement to function as a VoIP phone,however they strongly contribute to the user-friendliness. They are discussed in thesections below.

2.6.1 DHCP: Dynamic Host Configuration Protocol

For easy connecting devices to an IP network, DHCP is a solution. The Dynamic HostConfiguration Protocol is a way to automatically retrieve an IP address and all otherinformation, needed to operate on the network, e.g., subnetmask, gateway, DNS server.This service, defined in RFC 2131, works on UDP port 67. When DHCP is switched onon the DVS Phone, all network addresses are automatically retrieved and the user doesnot need to care about it. When DHCP is not available on the network, the necessaryIP addresses can be entered manually. The entered or retrieved addresses are displayedon the VGA screen.

2.6.2 DNS lookup: Domain Name Service

The Internet works with uniquely addressable instances, by means of an IP address.However, SIP addresses are most of the time of the form [email protected]. Whenthis address is entered by the user, the IP address of the wonderland.com server mustbe found. DNS translates the wonderland.com text into an IP address, as follows. AnDNS request is send to the DNS server on UDP port 53. The reply message from theDNS server contains information on the requested URL, where the IP address is one ofit.

Page 40: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

24 CHAPTER 2. BACKGROUND

2.6.3 NTP: Network Time Protocol

With DHCP on, when the IP address is retrieved, or when DHCP is off, directly atstartup, an time request is send to the time server. This address is at the moment notmodifiable by the user and standard set to 130.161.180.1 (time server of the TU Delft).A request is an empty UDP packet on port 37. The timeserver will directly responseto this request with time information among which the current time is. Now, the clockof the DVS Phone is set to this time. Actually, the round trip time (RTT) should bemeasured for the time server and the clock should be corrected accordingly, however,this is not implemented. When the geographical distance to this time server is less thanseveral kilometers, which in this project will be most of the times, the RTT is in theorder of magnitude of tenths of seconds and will not give a significant deviation of thereceived time.

2.7 Conclusions

We presented the basic knowledge to understand the context of this thesis. Signaling isneeded in order to find other VoIP phones on the Internet and the SIP protocol has beenchosen for implementation. Signaling fulfills the first minimal requirement of connectingthe DVS Phone to other VoIP phones. The second minimal requirement was that thevoice data is transported using the Internet. This could be done by packetizing rawsamples, however, problems emerge. Sending raw samples consumes much bandwidthand the receiving side has to know the format of the received data. The latter problemeven arises when sending samples, since sample values can be represented in multipleways. Another problem is that Internet cannot guarantee the in-order arrival of packets.The Real-time Transport Protocol (RTP) offers a solution for these problems. RTPnumbers the packets in the order send and specifies the payload type. The minimalrequirements of the VoIP phone are met.

In order to save bandwidth, voice compression is performed. The GSM codec ischosen, because the algorithm is not patented and therefore freely usable. The GSMcodec is widely used and well documented.

Working on local networks (LANs) can cause some problems when using VoIP, e.g.,accessing a host in a LAN from the public Internet. STUN can be a solution for theseproblems. Extra features were added to the DVS Phone to finish the phone as a product.These features are DHCP and DNS. DHCP enhances the plug-and-play behavior ofthe DVS Phone, because it is able to automatically retrieve an IP address and othernecessary network addresses. DNS is needed, to convert names like wonderland.com toan IP address.

Page 41: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

Implementation details 3The previous chapter described what functions must be implemented on the XUP boardin order to operate as a VoIP phone. This chapter will elaborate on the actual imple-mentation.

The main component for this project is the Xilinx University Program Virtex IIPro (XUPV2P) board. The board has connectors for peripherals, that also can befound on the average personal computer, e.g., Ethernet, sound input/output, keyboardinput, VGA output, and a serial interface. As already stated, the Ethernet and soundinput/output are crucial for this project. The Ethernet connector must be connectedto a network that has access to the Internet in order to make phone calls all over theworld. For the sound input/output any device suffices. The DVS Phone is tested with aheadset. The keyboard/VGA combination suits as user interface and the serial interfaceis used for debugging purposes.

There is also some simple input/output in the form of four LED’s, four DIP switchesand five push buttons. Each LED has its own meaning and is explained below:

• LED 0: When some User Agent INVITEs the DVS Phone, the DVS Phone replieswith the Ringing (180 Ringing) message and starts flashing this LED. Obviouslyno audio indication is implemented yet.

• LED 1: When the DVS Phone has INVITEd a friend and waits for the contact toanswer the INVITEation, this LED flashes.

• LED 2: When a speech connection is established between the DVS Phone and another User Agent, this LED is on.

• LED 3: When the SIP registration service is registering at the REGISTRAR server,this LED flashes. This procedure is initiated at startup and when the subscriptiontime has expired for half the time. When subscription has succeeded, the LEDstays on.

The push buttons are configured in the following way:

• Up button: When the Up button is pressed, the DVS Phone initiates a phone callto the contact, entered in the ”Friend SIP address”-field.

• Enter button: When the DVS Phone is INVITEd by another User Agent, the phonecan be picked up by pressing the Enter button.

• Down button: To end any conversation or to cancel an INVITEation, the Downbutton has to be pressed.

25

Page 42: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

26 CHAPTER 3. IMPLEMENTATION DETAILS

The Left and Right buttons do not serve any function and the DIP switches are onlyused for debugging purposes.

Section 3.1 and Section 3.2 describe the hard- and software architecture, respectively.Section 3.3 describes the methods for acquiring profiling results and how the speedup willbe calculated. The first profile results will be presented in Section 4.1 and consequentlya seperation has to be made between software and hardware. Section 3.4 describes thedesign and implementation of the hardware accelerators. Section 3.5 will conclude thischapter.

3.1 Hardware

In Figure 3.1 an overview is depicted of the XUP board with the Virtex II Pro, itsinternal configuration and the connections between the Virtex and the connectors. Theinterface modules (FPGA IP cores) for these peripherals are delivered standard with theXilinx EDK. A summary of the synthesis report is depicted below. It indicates that thisproject uses 47% of the slices of the Virtex II Pro, including the hardware accelerators.The following sections describe each of the following modules on hardware level: Buses,Memory, Ethernet, AC97, PS/2, VGA, RS232. The last section describes the Ethernetinterrupt mechanism.

Device utilization summary:

Number of External IOBs 77 out of 556 13%

Number of LOCed External IOBs 77 out of 77 100%

Number of MULT18X18s 4 out of 136 2%

Number of PPC405s 1 out of 2 50%

Number of RAMB16s 130 out of 136 95%

Number of SLICEs 6557 out of 13696 47%

Number of BUFGMUXs 4 out of 16 25%

Number of DCMs 1 out of 8 12%

3.1.1 Buses

Figure 3.1 depicts two buses, the PLB (Processor Local Bus) bus and the OPB (On-chipPeripheral Bus) bus. The PLB is a 64-bits wide bus and the OPB is a 32-bits wide bus.There is little difference between the PLB and the OPB bus. A rule of thumb is thata processor (of which two are present in the Virtex II Pro) has its own PLB with itsown program and data memory. Some modules, which demand high speed access areattached to the PLB bus. The OPB is used for peripherals. If two processors are used,they both have their own PLB bus, however these buses can be attached to one OPBbus with bridges. Consequently, the processors can share peripherals.

Page 43: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

3.1. HARDWARE 27

VGA Driver

Timer/counter

PS/2

AC97

PLB OPB

xc2vp30ff896-7(Virtex II Pro)

Ethernet transceiver

DA converters

AC97Codec

Bidirectional Level shifters

Chip

FPGA IP core

BRAM Memory

RS232 Line driver

Cross Correlation

Dual Lattice Filter

XUP board

LM4550

MAX3388ECUG

FMS3818KRC

BSS138

LXT972

Mic

Headphone

Line in

VGA Monitor

Keyboard

Terminal

Internet

PLB2OPB Bridge

Eth

Rcv

Inte

rrup

t

PowerPCcore

MAC

Data Memory

64KB

Code Memory128KB

Figure 3.1: XUP board Architecture and internal Virtex structure

3.1.2 Memory

Although the PowerPC1 implemented at the Virtex has an address space of 4 GB (32bit addresses), it is only capable of addressing a code memory space of 64 kB and adata memory space of 128 kB. Therefore, the memory is split over two different memorymodules. The data, heap and stack sections fit into the 64 kB memory block. The codesection is given its own memory block of 128 kB.

3.1.3 Ethernet

The MAC core provides an interface between the PowerPC and the Ethernet TransceiverLXT972. The MAC core is depicted in Figure 3.2 and is accessed via the PLB bus,because of the high speed access.

PLB bus

LXT972Ethernet Transceiver

RxD

PLB

Inte

rface

Log

ic

TxDRxDTxD

MAC Core (FPGA)

RJ-45 connector

MAC IPCarrier SenseCollision detected

44

Figure 3.2: Ethernet core connections

1The Virtex II Pro is supplied with the PowerPC 405D5 core, which is a 0.13 µm implementation ofthe IBM PowerPC 405D4 core.

Page 44: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

28 CHAPTER 3. IMPLEMENTATION DETAILS

The DVS Phone only uses UDP and each UDP datagram is packed in one EthernetFrame. The maximum length for an Ethernet Frame (MTU) is 1518 bytes and restrictsthe length for an UDP packet. The format of an Ethernet frame is depicted in Figure 3.3.

PLB Ethernet Media Access Controller (PLB_EMAC) (v1.01a)

6 www.xilinx.com DS474 August 19, 20041-800-255-7778 Product Specification

G(x) = x32 + x26 + x23 + x22 + x16 + x12 + x11 + x10 + x8 + x7 + x5 + x4 + x2 + x1 + x0

The CRC bits are placed in the FCS field with the x31 term in the left most bit of the first byte and the x0 term is the right most bit of the last byte (i.e., the bits of the CRC are transmitted in the order x31, x30,..., x1, x0). The EMAC implementation of the CRC algorithm calculates the CRC value a nibble at a time to coincide with the data size exchanged with the external PHY interface for each transmit and receive clock period.

For transmission, this field may be inserted automatically by the EMAC or may be supplied as part of the packet data pro-vided to the EMAC as indicated by a bit in the EMAC control register.

Figure 2: PLB Ethernet Data Format

PreambleStart of FrameDelimiter (SFD)

DestinationAddress

SourceAddress

Type/Length Data Pad Frame Check

Sequence

Number ofBytes 7 1 6 6 2 0 - 1500 0 - 46 4

64 - 1518 Bytes

Ethernet Frame

Figure 3: PLB Ethernet VLAN Data Format

PreambleStart of FrameDelimiter (SFD)

DestinationAddress

SourceAddress

Type/Length Data Pad Frame Check

Sequence

Number ofBytes

7 1 6 6 2 0 - 1500 0 - 46 4

68 - 1522 Bytes

Ethernet VLAN Frame

0X8100

2 2VLAN

tag

Interframe Gap1 and Deferring

Frames are transmitted over the serial interface with an interframe gap which is specified by the IEEE Std. 802.3 to be 96 bit times (9.6 uS for 10 MHz and 0.96 uS for 100 MHz). This is a minimum value and may be increased with a resulting decrease in throughput (results in a less aggressive approach to gaining access to a shared Ethernet bus). The process for deferring is different for half-duplex and full-duplex systems and is as follows:

Half-Duplex

1. Even when it has nothing to transmit, the EMAC monitors the bus for traffic by watching the carrier sense signal (CRS) from the external PHY. Whenever the bus is busy (CRS =’1’), the EMAC defers to the passing frame by delaying any pending transmission of its own.

2. After the last bit of the passing frame (when carrier sense signal changes from true to false), the EMAC starts the timing of the interframe gap.

3. The EMAC will reset the interframe gap timer if carrier sense becomes true during the period defined by the "interframe gap part 1 (IFG1)" field of the IFGP register. The IEEE std. 802.3 states that this should be the first 2/3 of the interframe gap timing interval (64 bit times) but may be shorter and as small as zero. The purpose of this option is to support a possible brief failure of the carrier sense signal during a collision condition and is described in paragraph 4.2.3.2.1 of the IEEE standard.

2. Reference IEEE Std. 802.3 para. 3.2.81. Interframe Gap and interframe spacing are used interchangeably and are equivalent.

Figure 3.3: Ethernet Frame data format (source: Xilinx CoreLogic PLB MAC controller)

The software prepares the source address, destination address, type, data andpadding fields. The MAC core adds the Preamble, Start of Frame Delimiter (SFD)and the Frame Check Sequence (FCS). This complete Ethernet frame is sent parallel pernibble to the Ethernet Transceiver LXT972 that represents the PHY layer of the networkstack. The Ethernet Transceiver takes care of sending the Ethernet Frame along the lineand reading incoming data from the line. It also keeps signals for the line status, e.g.,link up, carrier sense, collision detected. The link up, receive data and 10Base/100Basechoice signals are visualized by three LED’s near the RJ-45 connector. The EthernetTransceiver can be used in 10Base or 100Base transmission mode. In addition, it is alsocapable of automatically detecting transmission speed. This option is used in the DVSPhone.

3.1.4 AC97

The AC97 IP core controls the LM4550 AC97 codec chip. Because the LM4550 issynchronized to its own clock, the AC97 core uses FIFO buffers to supply a bufferingbetween two different clock areas. The block scheme is depicted in Figure 3.4.

OPB bus

Parallel to Serial

AC

97 In

terfa

ce lo

gic Address

Read Data

Write data

16

16

7

16

16

16

16

7

16

16

LM4550AC97 Codec

SData_In

SData_Out

MU

X

AC97 Clock Domain OPB Clock Domain

Cod

ec D

ata

Cod

ec C

ontro

l/Sta

tus

Reg

iste

r

OP

B In

terfa

ce L

ogic

Playback FIFO

Record FIFO

D Q

DQ

DQ

Reset

SyncBit_Clk

12.288 MHz

Mic

Line-inHeadphone

Serial to Parallel

Parallel to Serial

Parallel to Serial

Serial to Parallel

AC 97 Core (FPGA)

Figure 3.4: AC97 core implemented in the FPGA

One side of the AC97 module is led by the OPB bus clock (100 MHz) and the otherside is led by the Bit Clk (12.288 MHz), that is generated by the LM4550. The AC97

Page 45: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

3.1. HARDWARE 29

codec is a protocol to send and receive samples from the LM4550 and to receive statusdata and send control data. The protocol is described in detail in the datasheet of theLM4550. Data is bidirectionally send in frames of 20.8 µs as depicted in Figure 3.5. Aframe is separated in the following phases: the tag phase and the data phase. The Syncsignal is used to indicate the tag phase and also announces the beginning of a new frame.The sync signal is high during the first 16 bits. These bits indicate if the data in theslots are valid. After the sync phase, 12 data slots are sent of 20 bits each. Each slot hasits own content type, which is depicted in Figure 3.6. As a result, one frame containsexactly 256 bits.

Application Information

AC Link Serial Interface Protocol

AC Link Output Frame: SDATA_OUT (output fromcontroller, input to LM4550)

The audio output frame (output from AC ’97 Controller)contains control and PCM data targeted for the LM4550control registers and stereo DAC. The Tag slot, slot 0, con-tains 16 bits that tell the AC Link interface circuitry on theLM4550 the validity of the following data slots.

A new audio output frame is signaled with a low to hightransition of SYNC. SYNC is synchronous to the rising edgeof BIT_CLK. On the next rising edge of BIT_CLK, the AC ’97Controller drives SDATA_OUT with the first bit of slot 0. TheLM4550 samples SDATA_OUT on the falling edge of BIT-_CLK. The AC ’97 Controller will continue outputting theSDATA_OUT stream on each successive rising edge ofBIT_CLK.

SDATA_OUT Slot 0: Tag Phase

The first bit of slot 0 is designated the ’Valid Frame’ bit. If thisbit is 1, it indicates that the current data frame contains atleast one slot of valid data and the LM4550 will furthersample the next four bits and slots 7 & 8 and 6 & 9 todetermine which frames do in fact have valid data. Valid slotsare signified by a 1 in their respective slot bit position.

Bit Description Comment

15 Valid Frame1 = This frame has valid

data.

14Control register

address1 = Control Address is

valid.

10097204

FIGURE 3. AC 97 Bidirectional Audio Frame

10097206

FIGURE 4. AC Link Audio Output Frame

10097205

FIGURE 5. Start of Audio Output Frame

LM45

50

www.national.com 14

Figure 3.5: AC97 communication signals (source: datasheet LM4550, National Semicon-ductor)

The AC97 IP core, used by the DVS Phone, only supports slot 1 through 4. This issufficient to read the status registers, write the control registers, read record PCM dataand write playback PCM data.

Application Information

AC Link Serial Interface Protocol

AC Link Output Frame: SDATA_OUT (output fromcontroller, input to LM4550)

The audio output frame (output from AC ’97 Controller)contains control and PCM data targeted for the LM4550control registers and stereo DAC. The Tag slot, slot 0, con-tains 16 bits that tell the AC Link interface circuitry on theLM4550 the validity of the following data slots.

A new audio output frame is signaled with a low to hightransition of SYNC. SYNC is synchronous to the rising edgeof BIT_CLK. On the next rising edge of BIT_CLK, the AC ’97Controller drives SDATA_OUT with the first bit of slot 0. TheLM4550 samples SDATA_OUT on the falling edge of BIT-_CLK. The AC ’97 Controller will continue outputting theSDATA_OUT stream on each successive rising edge ofBIT_CLK.

SDATA_OUT Slot 0: Tag Phase

The first bit of slot 0 is designated the ’Valid Frame’ bit. If thisbit is 1, it indicates that the current data frame contains atleast one slot of valid data and the LM4550 will furthersample the next four bits and slots 7 & 8 and 6 & 9 todetermine which frames do in fact have valid data. Valid slotsare signified by a 1 in their respective slot bit position.

Bit Description Comment

15 Valid Frame1 = This frame has valid

data.

14Control register

address1 = Control Address is

valid.

10097204

FIGURE 3. AC 97 Bidirectional Audio Frame

10097206

FIGURE 4. AC Link Audio Output Frame

10097205

FIGURE 5. Start of Audio Output Frame

LM45

50

www.national.com 14

Figure 3.6: AC97 audio frame (source: datasheet LM4550, National Semiconductor)

3.1.5 PS/2

The PS/2 IP core drives the serial communication with the keyboard and mouse connec-tors, of which only the keyboard connector is used. The PS/2 protocol is a protocol thatuses two wires for bidirectional communication. One is the clock line which is controlledonly at the FPGA side. The data line is bidirectional, because it is driven at both sidesby an open collector. The PS/2 sender and receiver are used in polling mode. The PS/2core is depicted in Figure 3.7.

Page 46: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

30 CHAPTER 3. IMPLEMENTATION DETAILS

OPB bus

BSS138PS/2 Bidirectional Level Shifter

Data

OP

B In

terfa

ce L

ogic

ClkDataClk

PS/2 Core (FPGA)

PS/2 connector Bidir

port

Control logic

Serial to Parallel

Parallel to Serial 8

8

Figure 3.7: PS/2 core

For transmission and for receiving, FIFO buffers are used. Control logic controls theclock line and takes care of switching from sending mode to receive mode and vice versa.

3.1.6 VGA

The VGA core is depicted in Figure 3.8 and contains a memory, a character decoder, acolor decoder and some control logic. The memory is built from the standard BRAM

OPB bus

8 32

FMS3818KRCDA Converters

GreenO

PB In

terfa

ce L

ogic

BRAMMemory

RedGreenRed Char

ROM

VGA Core (FPGA)

SubD15 connector BlueBlue

Color decoder 8

Control logic

row col

HSync

VSync

40MHz

100 MHz

10

4Clk_Multiply

Clk_Divide

DCM

Pixelclock

222

Figure 3.8: VGA core

component, that is available in the Virtex II Pro. The BRAM’s have two ports to accessthe memory contents. The first port enables the software to write (and read) to thismemory, the second port is used to read out the memory and generate the VGA pixelsignal from it. The size of the memory is 32 KB, that allows us to store the 7500characters and according colors. Each character consists of a 8×8 dot matrix and thereare 75 lines of 100 characters. As a result, the screen mode is 800×600 at 60 Hz. Thisrefresh rate needs a 40 MHz pixelclock, that is generated by a DCM (of which the Virtexhas 8) by multiplying the 100 MHz system clock by 4 and dividing it by 10. The controllogic selects the right memory address and generates the synchronization signals. Thecharacter decoder translates the character data to pixels and the color decoder generatesthe right color for each pixel. Output of the VGA core are three color signals (each 2-bit

Page 47: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

3.1. HARDWARE 31

width, the 6 LSB bits are tied to the ground) for the D/A converters, and the Horizontaland Vertical synchronization signals. An example of the output is depicted in Figure 3.9.

Figure 3.9: Example VGA output

3.1.7 RS232

RS232 support is implemented for debugging purposes. The serial port is configured asstdin and stdout for the software. The serial port of the XUP board can be connectedto any RS232 terminal to read out and input data. The RS232 core programmed in theFPGA is depicted in Figure 3.10.

OPB bus

Parallel to Serial

8

8

8

8

MAX3388ECUGRS232 Line Driver

RxD

OPB

Inte

rface

Log

ic

Transmit FIFO

Receive FIFO

TxDRxDTxD

Serial to Parallel

RS232 Core (FPGA)

D9 connector

Figure 3.10: RS232 core

Page 48: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

32 CHAPTER 3. IMPLEMENTATION DETAILS

The core contains a FIFO buffer for received data and a FIFO buffer for data to betransmitted. If a byte is written to the transmission buffer by the software, the byte isread at the other side of the FIFO and serially sent over the TxD line along with start-and stopbits. The used baud rate for the serial port is 9600. In reverse, a received byteis stripped from start- and stopbits and stored in the receive FIFO. The other side canbe read out by software. No interrupt construction is used for the serial communication.

3.1.8 Interrupts

Ethernet is implemented using interrupts. The PowerPC supports only one externalinterrupt signal. The Ethernet MAC module drives a signal IP2INTC Irpt, which isdirectly attached to the PowerPC (see Figure 3.1), since the Ethernet interrupt is theonly interrupt that is used. If other interrupt sources would have been implemented, aninterrupt controller would be necessary. This interrupt controller combines all interruptsources to a single interrupt.

3.2 Software

The software is completely written in C and compiled by the powerpc-eabi-gcc compiler,supplied by the Xilinx EDK. The total size of the program is 142 kB. This data consistsof a code segment of 123 kB and a data segment of 19 kB. The code segment and thedata segment are assigned to separate memory blocks. This option can be given to thelinker using a linker script, which is printed in Appendix A.3.1. Though it is possible torun Linux on the XUP board, this is not used for the DVS Phone. Using a Linux kernelwould have advances with respect to running multiple processes, however, this problem issolved by calling process handlers in a round-robin fashion. This idea is depicted below.

void main(void)

{

// Initializations

while (1)

{

Process1();

Process2();

Process3();

Process4();

...

}

}

The DVS Phone has the following processes: Keyboard input (polling), SIP, DHCP,and audio buffers management. A complete overview of the software is presented inAppendix A. Section 3.2.1 describes the bus structure from the software side and Sec-tion 3.2.2 describes the changes that were made to the standard Ethernet library.

3.2.1 Bus structure

As depicted in Figure 3.1, the internal structure of the Virtex contains two buses con-nected by a bridge. Each module has its own address space in order to be accessed from

Page 49: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

3.3. TIMING MEASUREMENT 33

the PowerPC. The addresses of each module is depicted in Figure 3.11. It is obviousthat the modules attached to the OPB bus are accessed via the PLB2OPB bridge. Thebridge maps address with a maximum of 4 ranges onto the OPB bus, of which two areused. In this project addresses 0x40000000-0x4000FFFF and 0x00000000-0x00007FFFare mapped onto the OPB bus. The first address range allows for accessing all periph-erals, except for the Ethernet MAC, and the second address range allows for accessingthe char frame buffer of the VGA display.

Figure 3.11: Address ranges of modules

3.2.2 Ethernet

Xilinx delivers a standard Ethernet library to communicate through the Ethernet connec-tion, however, this is a rudimentary version of the library and some major enhancementswere needed. First, the buffer for received data was shared over all sockets. When re-ceiving two subsequent packets, the data of the former is overwritten by the latter. Forexample RTP data could overwrite SIP data which may result in missing essential callsetup/breakdown information. This probelm was solved by creating a buffer for eachsocket. The software has sockets for the following services: SIP, RTP, RTCP, DHCP,DNS and NTP. The buffers for DHCP, DNS, NTP are shared because these protocolsnever run at the same time. Second, much unnecessary code is removed in order to savecode data. Second, functions that were not used were deleted from the source files.

3.3 Timing Measurement

Key element in the timing measurement is the PowerPC’s 64-bit time base running atsystem clock speed, which is 100 MHz. Consequently, the measurements theoretically

Page 50: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

34 CHAPTER 3. IMPLEMENTATION DETAILS

have an accuracy of 10 ns. However, starting and stopping the timer has some over-head, e.g., executing instructions to read and write the time register. Three timingmeasurements are performed: Processor usage, Detailed profiling and Ping time. Theyare discussed in the following.

Processor usage. Because the software does not really supports running ’processes’at the same time, it calls different functions after each other belonging to different tasks,denoted by ’process functions’. Before a process function is called, the time register isstored and on return the time register is subtracted from the value on call. For eachprocess function the execution time is measured and the execution times are divided bythe execution time of all process functions to determine the relative process times. Theset of relative process times is called the processor usage. The implementation can beabstracted to the following structure, where the argument, e.g., PROCESS1, is a constantindicating the current process to measure.

void main(void)

{

Initializations;

while (1)

{

ChangeProcUsageCategory(PROCESS1);

Process1();

ChangeProcUsageCategory(PROCESS2);

Process2();

ChangeProcUsageCategory(PROCESS3);

Process3();

ChangeProcUsageCategory(PROCESS4);

Process4();

...

ChangeProcUsageCategory(IDLE);

}

}

The function ChangeProcUsageCategory captures the time register and subtracts itfrom the value of the previous ChangeProcUsageCategory call. This process time is cu-mulated in the appropriate variable. Every second the processor usage (U) is calculated.The processor usage for process x, is obtained using the following formula:

U [x] = T [x]p∑

n=1

T [n](3.1)

Where p is the number of processes and array T contains all cumulated executiontimes. After the calculation, the values are displayed on the VGA screen and the cumu-lative process time variables are cleared for a new measurement. Although not shown,the main loop contains little pieces of code that only call a process when needed. Forexample, the RTP unwrapping, GSM decoding and audio playback only has to be calledwhen data has arrived. Some flow control operations, e.g., testing if data has arrived,is assigned to the IDLE process time, because they are overhead operations. The exactmain loop lay-out is presented in Appendix A.2.

Detailed profiling of the GSM codec. For the GSM encoder, five subfunctions (pre-processing, LPC analysis, Short Term Analysis, Long Term Prediction, RPE encoding)

Page 51: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

3.4. RECONFIGURABLE HARDWARE DESIGN 35

are determined and for the GSM decoder, four subfunctions (RPE decoding, Long TermSynthesis, Short Term Synthesis, postprocessing) are determined. For each subfunction,the execution time is measured. The encoder/decoder functions as a whole are alsomeasured (including starting/stopping timer for measuring subfunctions). In addition,before the timer is started, a flag (interrupt occurred) is reset. The interrupt handlersets this flag. If the flag is set when the timer is stopped, the result is discarded. A ’0’is displayed for that measurement.

Ping time. The time between a ICMP request and reply is measured. When a ICMPrequest is sent, the value of the time register is stored and the ping response timevariable is set to -1. If a ICMP reply is received, the time difference is stored in theping response time. The ICMP request is send every 5 seconds while in the stateTalking to measure the Round Trip Time (RTT). Since the request is sent independentwhether any reply is already received, timeouts are not detected, however, the value ofping response time is only used if it is not equal to -1.

When timing measurements are performed, the speedup of the accelerated parts arecalculated. The formula for measuring the speedup is as follows:

S = Toriginal

Taccelerated(3.2)

Where Toriginal is the execution time of the original, pure software version andTaccelerated is the execution time when hardware acceleration is activated. This for-mula will be used for the execution time of the GSM encoding/decoding and for theirsubfunctions.

All measurements are performed real-time and displayed on the VGA screen. Switch-ing the hardware acceleration on and off has immediate effect on the GSM encod-ing/decoding times, as well as the processor usage. This effect can immediately beobserved by the values, displayed on the VGA screen. Chapter 4 will give the results ofthe measurements.

3.4 Reconfigurable hardware design

When profiling is performed, the next step is to select a part of the algorithm to extractfrom software and implement in hardware. Different aspects should be considered whendrawing the line between software and hardware. These aspects are: The amount ofdata to transport between software and hardware. If more data must be exchanged,the total execution time of the algorithm will be longer and the speedup will be lower.Another aspect is the size of the hardware implementation and the expected speedup itwill produce.

The requirements for the hardware that is to be designed are the following. Thehardware should give the same results as software. This can easily be verified by puttingin some test data in the software version and compare the results with the hardwareversion. Another requirement is that the hardware accelerators will not use more thanabout 10% of the available slices. This limit is soft and going beyond a few percent ispermitted. A substantial space must be left for future enhancements and functionalities.

Section 3.4.1 will present the first approach. A complete GSM decoder was imple-mented in hardware, however, it occupied 28% of the slices. In addition, the GSM

Page 52: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

36 CHAPTER 3. IMPLEMENTATION DETAILS

encoder would use more, since the decoder parts are also found in the encoder. If bothwere implemented in hardware, the GSM encoder and decoder would use more than56% of the slices. This conflicts with the requirement not to exceed the 10%. As aresult, parts of the encoder and decoder were selected for implementation in hardware.Through profiling, we observed that within the GSM encoder the Long Term Predictorand the Short Term Analysis together occupy 75% of the encoding time. Therefore,these functions were ported from software to hardware. Within the GSM decoder, wenoticed that the Short Term Synthesis filter occupies up to 80% of the GSM decodingtime. Therefore, this filter was also ported from software to hardware. Section 3.4.2 willdescribe the hardware implementation of the Short Term Analysis filter and the ShortTerm Synthesis Filter. The filters have a nearly identical structure, and therefore re-quire an equal design. Section 3.4.3 will describe the hardware design of the Long TermPredictor, found in the GSM encoder. Section 3.4.4 will present some optimizations thatwere performed regarding the Short Term Filters.

3.4.1 GSM Decoder

The initial approach was implementing a complete GSM decoder in hardware, becausethe communication overhead would be minimal. Only 33 bytes would have to be send tothe hardware accelerator and 320 bytes (160 16-bit words) would have to be send back.We first implemented the GSM decoder, because all parts of the decoder can be foundin the encoder.

gsm_dec (FPGA)OPB bus

control/status

16x32 bitBRAM data

FSM

startfinished

addr addr

32-bit registers

128x32 bitBRAM data

addr addr

data

data

81 bits shift register 39 bits shift register

Inverse lattice filter

Reflection coefficient interpolator

LAR to reflection coefficients

LUT pcm samplesxi(k)bj

xmax0

Nonzero sample

Nj

LARc

GSM parameter decoder

jLARcNj

bj

xi(k)

k

xmaxdata

Figure 3.12: GSM hardware decoder module architecture

The decoder design is depicted in Figure 3.12. The giant 81-bit and 39-bit shiftregisters, together with the lookup table form the basis of the decoder. They reconstructthe residual signal, using a feedback loop to copy samples according to the lag(Nj)

Page 53: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

3.4. RECONFIGURABLE HARDWARE DESIGN 37

parameter. The correcting APCM samples xj [k] are put in and scaled by the factorxmax. Since the APCM samples are downsampled by a factor 3, two 0 values are insertedbetween every sample. The nonzero sample signal controls the upsampling.

The synthesis report depicted that the number of slices used by the GSM hardwaredecoder was 28% of the total number of slices. If the GSM encoder would also be portedto hardware, at least 56% of the slices would be used for both the encoder and thedecoder. This would exceed the 10% limit that was posed to leave space for futureimplementations. As a result, parts of the encoder and decoder had to be selectedfor implementation in hardware. Through detailed profiling (all detailes are presentedin Chapter 4), we observed that the Short Term Analysis Filter and the Long TermPredictor are responsible for 75% of the GSM encoding time and that the Short TermSynthesis Filter is responsible for 80% of the GSM decoding time. Implementating onlythese subfunctions in hardware is sub-optimal with respect to communication overhead,because more data has to be transferred. Both Short Term Filters need transporting 320bytes forth and back. The Long Term Predictor needs 320 bytes (40 samples for the dsignal and 120 samples for the dp signal) to be transported to the hardware accelerator,and the result is the power of the d signal and the position (lag parameter) and valueof the cross correlation maximum. The Long Term Predictor result fits in 9 bytes.The design and implementation of the Short Term Filters and the Long Term Predictorhardware accelerators are presented in the following sections.

3.4.2 Short Term Filters

Since the Short Term Analysis Filter and the Short Term Synthesis Filter resemble,the design of the hardware accelerator was the same. Therefore, in this section wewill use the term Short Term Filter and lattice filter if referred to both filters. Thegeometry of the lattice filter is depicted in Figure 2.8. The inverse filter is depictedin Figure 2.12. The architecture of the Lattice filter hardware accelerator is depictedin Figure 3.13. It consists of a control/status register (control on writing and statuson reading the register), 4 32-bit registers for the reflection coefficients, two dual portBRAM memories, an FSM and arithmetic logic.

This hardware accelerator has to be used once per encoding/decoding a frame. Onecycle takes the following steps:

• Copying the data to be filtered to the BRAM, copying the reflection coefficients tothe registers;

• Asserting the start bit in the control register;

• Waiting until the finished bit in the status register is asserted;

• Retrieving the filtered signal from the BRAM.

To avoid an abrupt transition between different reflection coefficients for two subsequentframes, the first 40 samples are used for a gradual transition by interpolating the reflec-tion coefficients. This interpolation is performed by the hardware accelerator by shiftingand adding the current reflection coefficients and the previous coefficients. When a frame

Page 54: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

38 CHAPTER 3. IMPLEMENTATION DETAILS

lat_filter (FPGA)OPB bus

control/statusrefl0 | refl1refl2 | refl3refl4 | refl5

128x32 bitBRAM data

FSM

startfinished

addr addr

32-bit registers

128x32 bitBRAM data

addr addr

data

data

refl6 | refl7

hardware multipliers

Figure 3.13: Lattice filter module architecture

is filtered, the reflection coefficients are stored for interpolation when filtering the nextframe.

For storing and retrieving the original and filtered signal respectively, two BRAMblocks are used. This is because the Xilinx synthesizer did not understand the use of adual port BRAM and did not want to invoke a single BRAM. Since there are plenty ofBRAM’s available, this solution was chosen.

When the start signal is asserted, the FSM starts retrieving data from one BRAM.The sequence of calculating the filter is denoted by the blue numbers in Figure 2.8 and2.12. The output values are written back to the other BRAM.

The design of the lattice filter was verified through simulation and testing. Thesimulation results are identical to the optimized version of the lattice filter and, therefore,the reader is referred to Section 3.4.4.

3.4.3 Cross Correlation

The hardware accelerator Cross Correlation calculates the cross correlation function ofthe d and the dp signal and determines the index of the maximum and the peak value ofthis cross correlation function. The architecture of this IP core is depicted in Figure 3.14.

Via the OPB bus, the PowerPC writes to the BRAM memory of the Cross CorrelationIP core. The BRAM contains 128 32-bit words of which only 80 are used. Each 32-bitword contains 2 16-bit samples. This saves half the write cycles for the processor. TheBRAM is divided into 4 parts and works in a cyclic way which is depicted in Figure 3.15.

If subframe 1 is to be encoded (j=0), sample 0 through 39 of the d signal are copiedto the first part of the BRAM memory of the Cross Correlation module. Part 2, 3 and4 already contain the reconstructed Short Term Residual of the 3 previous subframes,which are needed to calculate the cross correlation. When subframe 1 is encoded, it issubsequently decoded and the reconstructed dp signal is stored in part 1 of the BRAMmemory of the cross correlation module. Then, subframe 2 (j=1) is copied to part 2 of

Page 55: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

3.4. RECONFIGURABLE HARDWARE DESIGN 39

OPB bus

controlmax_corr_index & status

max_corrpower

128x32 bitBRAMdata data

FSM

startfinished

addr

data

data

+

data

data

+

addr

32-bit registersj

Crosscorr (FPGA)

hardware multipliers

Figure 3.14: Cross-correlation module architecture

d[0..39] dp[-120..-81] dp[-80..-41] dp[-40..-1]j=0

d[0..39] dp[-120..-81] dp[-80..-41]dp[-40..-1]j=1

d[0..39] dp[-120..-81]dp[-80..-41] dp[-40..-1]j=2

d[0..39]dp[-120..-81] dp[-80..-41] dp[-40..-1]j=3

0x40007000 0x40007014 0x40007028 0x4000703C 0x40007050BRAM address

part 1 2 3 4

Figure 3.15: Subframe BRAM contents

the BRAM memory and part 3,4 and 1 contain the previous reconstructed subframes.In this way, memory bandwidth is saved, because any subframe to be encoded requirestwo memory parts to be written, instead of four.

When the data is written to the BRAM, the start bit in the control register is asserted.This action starts the FSM that calculates the cross correlation function and determinesthe maximum and the peak value of that maximum according to the pseudocode de-picted in Appendix B.1.1. The cross correlation function ranges from -120 through -40which are 81 values. These values are not stored, because only the maximum and thevalue of that maximum are relevant. The FSM calculates 2 values of the cross corre-lation function at the time, because the dual port memory delivers 32-bit words whichcarries 2 samples. When the FSM finished calculating two cross correlation values, theyare compared to the current maximum. If a calculated value is greater than the currentmaximum, the calculated value replaces this maximum and the index of this maximumis stored. Therefore two multiplier/accumulators are implemented as depicted in Fig-ure 3.14. When the 81st value is calculated, the second multiplier/accumulator is usedto calculate the power according to the following formula:

Sj =39∑i=0

d[i]2 (3.3)

The resulting values max corr index, max corr (value of the maximum) and powerare stored in registers that are readable from the OPB bus side and therefore accessible

Page 56: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

40 CHAPTER 3. IMPLEMENTATION DETAILS

by the processor. For encoding one GSM frame, the processor has to call this functionfor each subframe, which is 4 times per frame. One cycle consists of:

• Putting samples in the dual port memory and setting the value for j;

• Asserting the start bit in the control register;

• Waiting until the finished bit in the status register is asserted;

• Reading out the values max corr index, max corr (value of the maximum) andpower.

The max corr index value is in the range [0..80]. The max corr index value is subtractedfrom 120, which delivers directly the lag parameter for the subframe. The quotientof max corr and power is coded in the bj parameter, then some subsequent coding isperformed which is already explained in Section 2.4.2. When the dp signal is constructed,it is stored in the Cross Correlation module for comparison when encoding subsequentsubframes. The synthesis and timing results are presented in Chapter 4.

The most important information of the synthesis result is presented below:

Number of Slices: 524 out of 13696 3%

Number of Slice Flip Flops: 423 out of 27392 1%

Number of 4 input LUTs: 983 out of 27392 3%

Number of BRAMs: 1 out of 136 0%

Number of MULT18X18s: 2 out of 136 1%

A larger part (the useful information) of the synthesis report is presented in Appen-dix B.1.2. Above, it is depicted that the Cross Correlation occupies 3% of the slices. Asexpected from the design depicted in Figure 3.14, the Cross Correlation uses 1 BRAMand 2 18-bits signed multipliers.

The design was verified using a simulation and test data generated with Matlab. Thesimulation results are depicted in Appendix B. With Matlab the correct results couldeasily be generated. A first test was performed using real residual (d) and reconstructedresidual (dp) signal. A second test was performed with random data. The hardwareaccelerator was tested for all four values of j and was found to function correctly.

3.4.4 Optimization

The lattice filter and the inverse lattice filter, which accelerate the Short Term Analysisand the Short Term Synthesis, respectively, resemble each other. They differ little infilter geometry; the input and output format as well as the control signals are exactlythe same. Merging the FSMs of the two filters and adding an extra state that jumps tothe right filter algorithm, combines the two filters in one hardware accelerator.

When combining the two filters, a control signal must be added to select the properfunction. This signal is the normal/inverse signal and is depicted in Figure 3.16 thatdepicts the Dual Lattice Filter as the double filter hardware accelerator is called. Whenswitching from the lattice filter to the inverse lattice filter or vice versa, the filter statesare saved to guarantee smooth filtering between subsequent frames.

The most important synthesis information is presented below:

Page 57: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

3.4. RECONFIGURABLE HARDWARE DESIGN 41

dual_lat_filter (FPGA)OPB bus

control/statusrefl0 | refl1 for 0..12refl2 | refl3 for 0..12refl4 | refl5 for 0..12

128x32 bitBRAM data

FSM

startfinished

addr addr

32-bit registers

128x32 bitBRAM data

addr addr

data

data

refl6 | refl7 for 0..12

hardware multipliers

normal/inverse

refl0 | refl1 for 13..26refl2 | refl3 for 13..26refl4 | refl5 for 13..26refl6 | refl7 for 13..26refl0 | refl1 for 27..39refl2 | refl3 for 27..39refl4 | refl5 for 27..39refl6 | refl7 for 27..39refl0 | refl1 for 40..159refl2 | refl3 for 40..159refl4 | refl5 for 40..159refl6 | refl7 for 40..159

Figure 3.16: Dual lattice filter module architecture

Number of Slices: 1194 out of 13696 8%

Number of Slice Flip Flops: 1289 out of 27392 4%

Number of 4 input LUTs: 2133 out of 27392 7%

Number of BRAMs: 2 out of 136 1%

Number of MULT18X18s: 2 out of 136 1%

A larger part of the synthesis report is presented in Appendix B.2.1. Above is in-dicated that the Dual Lattice Filter occupies 8% of the Slices. As expected from thedesign depicted in Figure 3.16, the Dual Lattice Filter uses 2 BRAM and 2 18-bits signedmultipliers.

There is another remarkable change with respect to the single filter version. Thereflection coefficient interpolation has been ported back to software. This is no compute-intensive operation and saved 2% of the slices, at the expense of some extra communi-cation.

The design of the Dual Lattice Filter is verified by simulation and testing. In bothcases test data was filtered and compared to the result of the software version. The testdata contained pulses to verify the impulse response. The pulses were placed at position0, 13, 27 and 40 to verify a correct interpolation of the filter. The simulation resultsare depicted in Appendix B. Figure B.2.2 depicts the calculation of one output value.After the start signal is asserted, the start state determines according to the norm ninv

Page 58: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

42 CHAPTER 3. IMPLEMENTATION DETAILS

signal whether the normal or the inverse filter must be called. In this case, the impulsetest data is read. The impulse has a value of 0x2000 and the calculations are visible.v0 norm through v7 norm hold the filter’s state between different frames for smoothlyfiltering. After the filter output is calculated, this result is written back to the BRAMfor reading out by the software. Figure B.2.2 depicts the filtering of a complete framefor the normal lattice filter as well as the inverse lattice filter. After verifying the resultsof the simulation, the Dual Lattice Filter was implemented and tested. These tests werepassed and the Dual Lattice Filter was implemented in the GSM encoder and decoder.

3.5 Conclusions

We presented the implementation details for the software as well as the hardware. TheXUPV2P board was described in detail including the implemented peripherals, e.g.,Ethernet, sound input/output, keyboard input, VGA output, and a serial interface. Eachperipheral is connected with the PowerPC via a bus system. We presented the setup forthe timing measurements. The PowerPC has a 64-bit time register that increments atsystem clock speed, which is 100 MHz. By reading the time register value before andafter an operation, the occupied time for that operation is measured.

The timing results depicted that the GSM encoder and decoder were the most time-consuming operation, while in the state Talking. The next step was designing hardwareto accelerate the GSM encoder and decoder. The requirements for the hardware designwere, first, the hardware should give the same result as the software, second, the hardwareaccelerators should not use more than 10% of the slices. This requirement is a directiverather than a hard limit.

The initial approach entailed the design of a complete GSM decoder. This would bean optimal solution with respect to communication overhead, since only the audio sam-ples and the coded data have to be transported. However, the synthesis results depictedthat the design used up to 28% of the slices. This proved that a design comprising anencoder and decoder would never fit within 10% of the slices.

The next approach was selecting a part of the encoder and decoder algorithm forimplementation in hardware. Within the GSM encoder the Short Term Analysis Filterand the Long Term Predictor were ported to hardware and within the GSM decoderthe Short Term Synthesis Filter was ported to hardware. The Short Term Filters wereaccelerated by implementing the lattice filters in hardware. The Short Term AnalysisFilter as well as the Short Term Synthesis Filter used 7% of the slices each. The filtersare nearly identical, therefore, they were combined in one hardware accelerator thatused 8% of the slices. This optimization saved 6% of the slices. The compute-intensivecross correlation function was implemented in hardware to accelerate the Long TermPredictor. This hardware accelerator used 3% of the slices. The amount of used slicesfor hardware accelerators is 11%. This brings the total amount of used slices for theentire project to 47% which leaves enough space for future extentions.

Page 59: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

Experimental Results 4In Chapter 3, the implementation details, hardware accelerator designs, and the mea-suring methods are described. This chapter presents the measurement outcomes. Theseoutcomes depend on the DVS Phone’s state. We distinguish the following states: Idle,Ringing, and Talking. In the Idle state, the DVS Phone is idle and waits for any actionfrom the user or from another User Agent via the Internet. In the Ringing state itwaits for the invitee to accept or decline the INVITation or the inviter to cancel the callsetup. In the Talking state, it has established a connection with another User Agent.For each state, the processor usage was measured. The results indicated that when theDVS Phone was in the state Idle or Ringing, the processor usage was not interesting.The processor usage was most interesting when the DVS Phone was is the Talking state.As a result, the experiment outcomes presented, apply to the Talking state.

The processor usage and timing measurements are printed directly on the VGAdisplay are therefore captured easily. When an connection is established, the valuesare updated every second. We observed that the measurements fluctuate within 1%.Therefore, the measurements that are presented are averaged values over 5 samples.

Section 4.1 presents the processor usage and profiling results for the pure softwareversion. Section 4.2 presents the processor usage and profiling results using the hardwareacceleration. The speedup is calculated for the GSM decoder hardware accelerator,for the selected parts of the GSM encoder and decoder. Finally, the overall speedupand the communication overhead are calculated. Section 4.3 presents calculations onthe communication overhead of the implemented hardware accelerators. Section 4.4concludes this chapter.

4.1 Timing results software version

This section presents the processor usage and profiling results of the pure software ver-sion. The results will justify the choice to accelerate the GSM encoding/decoding utiliz-ing reconfigurable hardware. The processor usage of the pure software version is depictedin Figure 4.1.

We observe that the GSM encoding occupies 47% and the GSM decoding occupies24% of the processor time. We infer that the GSM encoding/decoding is responsible for71% of the processor usage. Subsequently, we performed some detailed profiling on theGSM encoding/decoding. The results are depicted in Figure 4.2. Figure 4.2(a) depictsa detailed profiling of the GSM encoder and Figure 4.2(b) depicts a detailed profiling ofthe GSM decoder.

When observing the GSM encoder profile, we notice that the Long Term Predictorconsumes 4 ms, which is almost half the encoding time and the Short Term Analysis

43

Page 60: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

44 CHAPTER 4. EXPERIMENTAL RESULTS

19%

6%

0%

0%

24%

2%

2%

47%

0%

0%

0% 5% 10% 15% 20% 25% 30% 35% 40% 45% 50%

Idle

SIP

Unpack

Unwrap RTP

GSM Decode

Playback

Record

GSM Encode

Wrap RTP

Pack

Figure 4.1: Processor usage

0 1 2 3 4 5 6 7 8 9 10

Preprocess

LPC Analysis

Short Term Analysis Filter

Long Term Predictor

RPE Encoding

Total

Time (ms)

(a) Encoder time measurement

0 0,5 1 1,5 2 2,5 3 3,5 4 4,5 5

RPE Decoding

Long Term Synthesis Filtering

Short Term Synthesis Filter

Postprocessing

Total

Time (ms)

(b) Decoder time measurement

Figure 4.2: GSM profiling

Filter consumes 2.4 ms, which is a quarter of the encoding time. The Long TermPredictor and the Short Term Analysis together occupy 75% of the encoding time, whichis 8.6 ms.

When observing the GSM decoder profile, we notice that the Short Term SynthesisFilter consumes 3.8 ms, which is up to 80% of the GSM decoding time, which is 4.6 ms.

As a result, we selected three parts of the GSM encoder/decoder that are worth whilebeing accelerated, i.e., the Long Term Predictor, the Short Term Analysis Filter, andthe Short Term Synthesis Filter. Subsequently, we are able to predict what the speedupwill become if these parts are infinitely accelerated. The GSM encoding speedup (S) willbecome:

SGSMencoder = Toriginal

Taccelerated= 8.6 ms

2.2 ms = 3.9 (4.1)

The GSM decoding speedup (S) will become:

SGSMdecoder = Toriginal

Taccelerated= 4.6 ms

0.78 ms = 5.9 (4.2)

This infinitely acceleration corresponds to a 100% acceleration and an infinitespeedup. In reality, the calculation takes time as well as the communication for dataexchange and controlling the process. The gained speedup will be finite and the accel-eration will be less than 100%. Measuring the processor usage also consumes overhead.The exact results are presented in the next section.

Page 61: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

4.2. HARDWARE ACCELERATION RESULTS 45

4.2 Hardware acceleration results

First, the hardware acceleration results of the first approach, the complete GSM hardwaredecoder, are presented. Second, the measurement results are presented of the partialaccelerated GSM encoder and decoder.

The GSM hardware decoder is able to decode one frame in 136 µs. The overallspeedup for the GSM decoder is:

SGSMdecoder = Toriginal

Taccelerated= 4.6 ms

0.136 ms = 34 (4.3)

This is a promising result, however, we stated earlier that the implementation doesnot meet the requirements. The second approach is discussed in the following.

In the previous section, we observed that the Long Term Predictor, the Short TermAnalysis Filter, and the Short Term Synthesis Filter occupied 75% of the processorusage. Consequently, two hardware accelerator IP cores were created: Cross Correlationand Dual Lattice Filter. The measurement results of these hardware accelerators arepresented in this section.

Figure 4.3 depicts the processor usage when hardware acceleration is activated. Theprocessor usage for GSM encoding and decoding together are approximately 50%.

49%

16%

0%

0%

7%

2%

2%

22%

1%

0%

0% 10% 20% 30% 40% 50% 60%

Idle

SIP

Unpack

Unwrap RTP

GSM Decode

Playback

Record

GSM Encode

Wrap RTP

Pack

Figure 4.3: Processor usage with hardware acceleration

Figure 4.4(a) depicts a detailed profiling of the GSM encoder. Two parts of thealgorithm are hardware accelerated, i.e., the Short Term Analysis Filter and the LongTerm Predictor. Figure 4.4(b) depicts a detailed profiling of the GSM decoder. One partof the algorithm is hardware accelerated, i.e, the Short Term Synthesis Filter.

We observed that the Short Term Analysis Filter without hardware accelerationoccupies 2.5 ms, and with hardware acceleration it occupies 0.30 ms. We achieved aspeedup of:

SShortTermAnalysis = Toriginal

Taccelerated= 2.5 ms

0.30 ms = 8.3 (4.4)

The same calculation is performed for the Long Term Prediction, which was acceler-ated by hardware from 4.0 ms to 0.72 ms, which is a speedup S of:

SLongTermPredictor = Toriginal

Taccelerated= 4.0 ms

0.72 ms = 5.6 (4.5)

Page 62: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

46 CHAPTER 4. EXPERIMENTAL RESULTS

0 1 2 3 4 5 6 7 8 9 10

Preprocess

LPC Analysis

Short Term Analysis Filter

Long Term Predictor

RPE Encoding

Total

Time (ms)

(a) Encoder time measurement

0 0,5 1 1,5 2 2,5 3 3,5 4 4,5 5

RPE Decoding

Long Term Synthesis Filtering

Short Term Synthesis Filter

Postprocessing

Total

Time (ms)

(b) Decoder time measurement

Figure 4.4: GSM profiling with hardware acceleration

The acceleration of the Short Term Analysis filter and the Long Term Predictionhas impact on the total GSM encoding time. Without hardware acceleration the GSMencoder occupies 8.6 ms, and with hardware acceleration, it occupies 3.2 ms. The overallspeedup for the GSM encoder is:

SGSMencoder = Toriginal

Taccelerated= 8.6 ms

3.2 ms = 2.7 (4.6)

For the GSM decoder only the Short Term Synthesis Filter is hardware accelerated.Without hardware acceleration the Short Term Synthesis Filter consumes 3.8 ms, andwith hardware acceleration it consumes 0.29 ms, which is a speedup of:

SShortTermSynthesis = Toriginal

Taccelerated= 3.8 ms

0.29 ms = 13 (4.7)

For the GSM decoder, the total execution time is brought back from 4.6 ms to 1.1 ms,which is an overall speedup of the GSM decoder of:

SGSMdecoder = Toriginal

Taccelerated= 4.6 ms

1.1 ms = 4.2 (4.8)

4.3 Communication overhead

As other studies have proven [9], data transfer between software and hardware can be thebottleneck of the speedup in a system. In this section, calculations are presented thatdenote the amount of communication overhead. First, the communication overhead ofthe GSM hardware decoder is determined and, second, of the partial accelerated encoderand decoder. The communication overhead (O) is calculated according to the followingformula:

O = TCommunicationTTotalExecution

(4.9)

Where TCommunication is the time needed for communication. This time is acquiredby subtracting the time needed for the actual calculation from the total execution timeof the process TTotalExecution. The time needed for the calculation is determined byexamining the simulation result.

The GSM hardware decoder occupied 136 µs for decoding one frame, where 13 µs isused for decoding. The communication overhead was:

Page 63: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

4.4. CONCLUSIONS 47

O = TCommunicationTTotalExecution

= 0.123 ms0.136 ms = 90 % (4.10)

Consequently, only 10% of the execution time is used for the actual decoding. Thecommunication overhead is relatively large.

The Short Term Analysis occupied 0.30 ms of which 0.048 ms is used for the calcu-lation. The communication overhead was:

O = TCommunicationTTotalExecution

= 0.252 ms0.30 ms = 84 % (4.11)

The Long Term Predictor occupied 0.72 ms of which 0.091 ms is used for the calcu-lation. The communication overhead was:

O = TCommunicationTTotalExecution

= 0.629 ms0.72 ms = 87 % (4.12)

The overall communication overhead for the GSM encoder was:

O = TCommunicationTTotalExecution

= 0.881 ms3.2 ms = 28 % (4.13)

The Short Term Synthesis occupied 0.29 ms of which 0.052 ms is used for the calcu-lation. The communication overhead was:

O = TCommunicationTTotalExecution

= 0.238 ms0.29 ms = 82 % (4.14)

The overall communication overhead for the GSM decoder was:

O = TCommunicationTTotalExecution

= 0.238 ms1.1 ms = 22 % (4.15)

We infer that for the GSM hardware decoder, the relative communication overheadis larger than for the partial accelerated GSM decoder. In contrast, the absolute com-munication overhead is larger for the partial accelerated GSM decoder than for the GSMhardware decoder.

4.4 Conclusions

We observed that the GSM encoder and GSM decoder used 8.6 ms and 4.6 ms re-spectively, while in the state Talking. That was 47% and 24% of the processor time,respectively. These operations were selected for hardware acceleration.

The first approach was the GSM hardware decoder. This hardware acceleratorachieved a speedup of 34 times for the GSM decoder, however, the size constraintswere not met. Therefore, this approach was abandoned.

The second approach was selecting a part of the GSM encoder and decoder. Thesubfunctions within the GSM encoder that were implemented in hardware, were theShort Term Analysis Filter and the Long Term Predictor. These functions were spedup a factor 8.3 and 5.6 respectively. The subfunction within the GSM decoder that wasimplemented in hardware, was the Short Term Synthesis Filter. This function was spedup a factor 13. The overall speedup for the GSM encoder and decoder was 2.7 and 4.2,respectively.

Page 64: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

48 CHAPTER 4. EXPERIMENTAL RESULTS

Page 65: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

Conclusions 5We designed and implemented a Voice-over IP phone with minimal requirements. Westated in Chapter 1 that minimal requirements were the capability of connecting toother VoIP phones and that Internet should be used for voice data transport. These tworequirements were met.

Section 5.1 presents a summary of the conclusions of previous chapters. Section 5.2will elaborate on the main contributions of this thesis. Section 5.3 presents recommen-dations for further research on the DVS Phone.

5.1 Summary

In Chapter 2, we presented the basic knowledge to understand the context of this thesis.Signaling is needed in order to find other VoIP phones on the Internet and the SIP proto-col has been chosen for implementation. Signaling fulfills the first minimal requirementof connecting the DVS Phone to other VoIP phones. The second minimal requirementwas that the voice data is transported using the Internet. This could be done by pack-etizing raw samples, however, problems emerge. Sending raw samples consumes muchbandwidth and the receiving side has to know the format of the received data. The lat-ter problem even arises when sending samples, since sample values can be represented inmultiple ways. Another problem is that Internet cannot guarantee the in-order arrival ofpackets. The Real-time Transport Protocol (RTP) offers a solution for these problems.RTP numbers the packets in the order send and specifies the payload type. The minimalrequirements of the VoIP phone are met.

In order to save bandwidth, voice compression is performed. The GSM codec ischosen, because the algorithm is not patented and therefore freely usable. The GSMcodec is widely used and well documented.

Working on local networks (LANs) can cause some problems when using VoIP, e.g.,accessing a host in a LAN from the public Internet. STUN can be a solution for theseproblems. Extra features were added to the DVS Phone to finish the phone as a product.These features are DHCP and DNS. DHCP enhances the plug-and-play behavior ofthe DVS Phone, because it is able to automatically retrieve an IP address and othernecessary network addresses. DNS is needed, to convert names like wonderland.com toan IP address.

In Chapter 3, we presented the implementation details for the software as well asthe hardware. The XUPV2P board was described in detail including the implementedperipherals, e.g., Ethernet, sound input/output, keyboard input, VGA output, and aserial interface. Each peripheral is connected with the PowerPC via a bus system. Wepresented the setup for the timing measurements. The PowerPC has a 64-bit time register

49

Page 66: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

50 CHAPTER 5. CONCLUSIONS

that increments at system clock speed, which is 100 MHz. By reading the time registervalue before and after an operation, the occupied time for that operation is measured.

The timing results depicted that the GSM encoder and decoder were the most time-consuming operation, while in the state Talking. The next step was designing hardwareto accelerate the GSM encoder and decoder. The requirements for the hardware designwere, first, the hardware should give the same result as the software, second, the hardwareaccelerators should not use more than 10% of the slices. This requirement is a directiverather than a hard limit.

The initial approach entailed the design of a complete GSM decoder. This would bean optimal solution with respect to communication overhead, since only the audio sam-ples and the coded data have to be transported. However, the synthesis results depictedthat the design used up to 28% of the slices. This proved that a design comprising anencoder and decoder would never fit within 10% of the slices.

The next approach was selecting a part of the encoder and decoder algorithm forimplementation in hardware. Within the GSM encoder the Short Term Analysis Filterand the Long Term Predictor were ported to hardware and within the GSM decoderthe Short Term Synthesis Filter was ported to hardware. The Short Term Filters wereaccelerated by implementing the lattice filters in hardware. The Short Term AnalysisFilter as well as the Short Term Synthesis Filter used 7% of the slices each. The filtersare nearly identical, therefore, they were combined in one hardware accelerator thatused 8% of the slices. This optimization saved 6% of the slices. The compute-intensivecross correlation function was implemented in hardware to accelerate the Long TermPredictor. This hardware accelerator used 3% of the slices. The amount of used slicesfor hardware accelerators is 11%. This brings the total amount of used slices for theentire project to 47% which leaves enough space for future extentions.

In Chapter 4, we observed that the GSM encoder and GSM decoder used 8.6 ms and4.6 ms respectively, while in the state Talking. That was 47% and 24% of the processortime, respectively. These operations were selected for hardware acceleration.

The first approach was the GSM hardware decoder. This hardware acceleratorachieved a speedup of 34 times for the GSM decoder, however, the size constraintswere not met. Therefore, this approach was abandoned.

The second approach was selecting a part of the GSM encoder and decoder. Thesubfunctions within the GSM encoder that were implemented in hardware, were theShort Term Analysis Filter and the Long Term Predictor. These functions were spedup a factor 8.3 and 5.6 respectively. The subfunction within the GSM decoder that wasimplemented in hardware, was the Short Term Synthesis Filter. This function was spedup a factor 13. The overall speedup for the GSM encoder and decoder was 2.7 and 4.2,respectively.

5.2 Main contributions

We implemented a VoIP phone with minimal requirements. These requirements werethe ability to establish a connection with other VoIP phones and exchanging voice dataover this connection using the Internet. The main contributions of this thesis are:

Page 67: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

5.3. RECOMMENDATIONS FOR FURTHER RESEARCH 51

• A fully functional VoIP phone with minimal requirements has been designed andimplemented in hardware.

• The VoIP Phone has has been profiled to determine the most time-consumingprocesses. We observed that the GSM encoder and GSM decoder were the mosttime-consuming processes, while in the state Talking.

• The first design for a hardware accelerator comprised the implementation of acomplete GSM decoder in hardware, achieving a speedup of 34 times. However,the design used 28% of the slices exceeding the maximum number of slices, reservedfor hardware accelerators.

• The GSM encoder and GSM decoder were analyzed and profiled in detail. Weobserved that the Short Term Analysis Filter, the Short Term Synthesis Filter andthe Long Term Predictor were the most time-consuming functions and they havebeen ported to reconfigurable hardware.

• For the GSM encoder, the Short Term Analysis Filter was sped up 8.3 times andthe Long Term Predictor was sped up 5.6 times. For the GSM decoder, the ShortTerm Synthesis Filter was sped up 13 times.

• The overall speedup for the GSM encoder was 2.7 times and the GSM decoder wasaccelerated 4.2 times.

• The Long Term Predictor occupied 3% of the slices. Optimization reduced thenumber of slices occupied by the Short Term Filters from 14% to 8%.

• Additional functionalities were necessary to allow convenient operation of the VoIPphone. These functions are DHCP for network plug-and-play behavior and DNSfor making it unnecessary for the user to type IP addresses.

5.3 Recommendations for further research

This section will discuss the possibilities for future improvements. The most urgentimprovement is compliance with the newer SIP standard as defined in RFC 3261. SIPservers must support the older SIP standard, however, it is uncertain how long this willhold. Further improvements are discussed at three levels: functionality, user-friendlinessand connectivity. They are presented in the following.

On the functional level, as already stated, more features could be added. Thesefunctionalities are for instance:

• Video conferencing support. An option could be added to connect a webcam tocapture video data and the VGA output for displaying video data.

• Encryption of the speech data. To make sure no one is eavesdropping, a secureconnection, e.g. PGP (Pretty Good Privacy), is essential.

Page 68: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

52 CHAPTER 5. CONCLUSIONS

• Advanced playback buffer control. The quality of voice largely depends on the QoSparameter jitter. A playback buffer is needed to compensate for jitter, however,a large playback buffer exhibits delay. Much investigation already has been per-formed on the optimal size of the playback buffer. The DVS Phone has a fixedplayback buffer that could be improved by implementing an advanced scheme.

• Voice Activity Detection (VAD); Compressing codecs are one way to save band-width, in addition, more bandwidth could be saved by only sending voice data ifthe user talks.

Regarding user-friendliness, the following additions make using the DVS Phone moreconvenient:

• No configuration parameters are stored at the moment. The System ACE controllercould be utilized to store preferences data. At startup, the current implementationloads a standard configuration, that is coded in the FPGA bitstream. A futureversion of the DVS Phone should store configuration parameters, in order to freethe user from entering these parameters each time at startup.

• If the DVS Phone is INVITEd, the user informed through a LED on the XUPboard. Augmentation of an audio signal (e.g. a bell) is a more sophisticated wayto notify the user it is being called.

• The user interface is very basic and could use refinement, e.g., using multiplescreens or menus may offer a better overview for the user.

On the connectivity level, the Ethernet interface can be replaced by a wireless Internetconnection to perform research for mobile VoIP purposes.

Page 69: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

Bibliography

[1] http://spie.org/Conferences/Calls/01/itcom/itcom.pdf, ITCOM 2001, 2001, p. 11.

[2] Y. Amir, C. Danilov, S. Goose, D. Hedqvist, and A. Terzis, 1-800-OVERLAYS:Using Overlay Networks to Improve VoIP quality.

[3] C. Bormann and J. Degener, ftp://ftp.cs.tu-berlin.de/pub/local/kbs/tubmik/gsm/gsm-1.0.10.tar.gz.

[4] C. Boutremans, G. Iannaccone, and C. Diot, Impact of link failures on VoIP per-formance.

[5] X. Chen, C. Wang, D. Xuan, Z. Li, Y. Min, and W. Zhao, Survey on QoS Man-agement of VoIP, Proceedings of the 2003 International Conference on ComputerNetworks and Mobile Computing, 2003.

[6] Cisco, Understanding Codec complexity, Hardware Support, MOS, and Negotiation,Document ID: 14069.

[7] K. Compton and S. Hauck, Reconfigurable computing: a survey of systems andsoftware, ACM Computing Surveys (CSUR), Volume 34, Issue 2, June 2002.

[8] S.D. Cotofana, Embedded systems lab. course (et3301), Lab course in embeddedsystems design taught at TU Delft.

[9] G. de Goede, Accelerating the XviD IDCT on DAMP, Master’s thesis, Departmentof Computer Engineering, TU Delft, November 2004.

[10] ETSI (European Telecommunication Standard Institute), GSM: Digital CellularTelecommications System(Phase 2); Full Rate Speech; part 2: Transcoding (GSM06.10 version 4.2.0), Aug 2000.

[11] T. Eyers and H. Schulzrinne, Predicting Internet Telephony Call Setup Delay, Proc.1st IP-Telephony Wksp., Berlin, Germany, Apr 2000.

[12] H. Furuya, S. Nomoto, H. Yamada, N. Fukumoto, and F. Sugaya, Experimental In-vestigation of the Relationship between IP Network Performances and Speech Qualityof VoIP.

[13] S. Garg and M. Kappes, Can I add a VoIP call?, IEEE International Conferenceon Communications, May 2003, pp. 779– 783.

[14] M. Handley and V. Jacobson, Rfc 2327: Session description protocol, April 1998,http://www.ietf.org/rfc/rfc2327.txt.

[15] M. Handley, H. Schulzrinne, G. Camarillo, A. Johnston, J. Peterson, R. Sparks,E. Schooler, and J. Rosenberg, Rfc 3261: Session initiation protocol, June 2002,http://www.ietf.org/rfc/rfc3261.txt.

53

Page 70: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

54 BIBLIOGRAPHY

[16] M. Handley, H. Schulzrinne, E. Schooler, and J. Rosenberg, Rfc 2543: Sessioninitiation protocol, March 1999, http://www.ietf.org/rfc/rfc2543.txt.

[17] A. Hashad, K. Shehata, M.A. Dahad, and K.M.S.E. El-Dars, Design and implemen-tation of a signaling unit for a simple IP phone, 2004, pp. 851–854.

[18] P.K. Jawahar and V. Vaidehi, Reconfigurable Architecture to enhance QoS for VoIPnetworks.

[19] M.J. Karam and F.A. Tobagi, Analysis of the delay and jitter of voice traffic overthe internet, IEEE INFOCOM 2001, July 2001, pp. 824–833.

[20] K. Katsoulakis, T. Arslan, T. Kirkham, and S. Khawam, A low-lower reconfigurabledatapath for advanced speech coding algorithms, Proceedings of the 19th IEEE In-ternational Parallel and Distributed Processing Symposium, 2005.

[21] A. Markopoulou, F.A. Tobagi, and M.J. Karam, Assessment of VoIP Quality overInternet backbones, IEEE INFOCOM, 2002, pp. 150–159.

[22] N. Mobini, B. Bahdat, and M.H. Radfar, An FPGA based implementation of G.729,IEEE International Symposium on Circuits and Systems, May 2005, pp. 3571–3574.

[23] M. Narbutt and L. Murphy, VoIP Playout Buffer Adjustment using Adaptive Esti-mation of Network Delays, 18th Int. Teletraffic Congress ITC-18, Berlin, Germany,Sept 2003, pp. 1171–1180.

[24] M. Narbutt and Liam Murphy, A New VoIP Adaptive Playout Algorithm, IEEETrans. on Broadcasting, Vol. 50, No. 1, Mar 2004, pp. 1–10.

[25] Miroslaw Narbutt and L. Murphy, Improving Voice over IP Subjective Call Quality,IEEE Communications letters, Vol. 8, No. 5, May 2004, pp. 308–310.

[26] H. Noori, H. Pedram, A. Akbari, and S. Sheidaei, FPGA implementation of a DSPcore for full rate and half rate GSM vocoders, Proceedings of the 12th InternationalConference on Microelectronics, Oct 2000, pp. 273–276.

[27] H. Schulzrinne, S. Casner, R. Frederick, and V. Jacobson, Rfc 3550: Real-timetransport protocol, July 2003, http://www.ietf.org/rfc/rfc3550.txt.

[28] International Telecommunication Union, Methods for subjective determination oftransmission quality, ITU-T Recommendation P.800, Nov 1996.

[29] Michel van den Braak and Stephan Wong, FPGA implementation of Voice-overIP, Proceedings of the 16th Annual Workshop on Circuits, Systems and SignalProcessing, Nov 2005, pp. 338–342.

Page 71: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

Software description AThis appendix presents the details on the software. It should be the information neededfor the implementation of future extensions. Section A.1 presents the connections be-tween the source files. Section A.2 describes the main function and the main loop. Sec-tion A.3 describes all source files and the functions they define. Section A.3.1 presentsthe linker script.

A.1 Source file deployment

This function presents an overview of the source files and their links. The arrows followthe rule: A source file points to other source files from which it calls one or more functions.

main.c

sip.c

ac97.c

internet.c

digcalc.c md5.c

io.c

lib.c

gsm_ifc

rtp.c

ps2.c

xps2_l.c

stun.c

systime.c

dhcp.c

g711.c

ntp.c

eth.c ip.c

arp.c

udp.c

xilsock.cicmp.c

gsm_src

55

Page 72: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

56 APPENDIX A. SOFTWARE DESCRIPTION

A.2 Main loop

In this section, the main function is depicted, containing a main loop. The code depictedbelow gives an overview of the software architecture. One must notice that the code isstrongly simplified and its only purpose is to give future programmers an insight. Somearguments to functions are omitted and variable names are shortened.

void main(void)

{

InitTimers();

InitInternet();

DHCPInit();

DNSInit();

InitAC97();

InitIO();

GSMInit();

NTPInit();

Tekenscherm();

while (1)

{

ChangeProcUsageCategory(TIM_SIP);

DoSIPRegistration();

DoSIPConnection();

ChangeProcUsageCategory(TIM_IDLE);

if (XAC97_getOutFIFOLevel(XPAR_OPB_AC97_0_BASEADDR)>=160)

{

ChangeProcUsageCategory(TIM_TALKRECORD);

for (i=0;i<160;i++)

recSamples[i] = XAC97_mGetOutFifoData(XPAR_OPB_AC97_0_BASEADDR);

ChangeProcUsageCategory(TIM_TALKENCODE);

if (used_codec == CODEC_G711)

{

G711Encode(recSamples, &rtpdata, 160);

ChangeProcUsageCategory(TIM_TALKWRAPRTP);

RTPSendPacket(rtpdata, 160, rtp_dest_addr);

}

else if (used_codec == CODEC_GSM)

{

GSMEncode(recSamples, &rtpdata);

ChangeProcUsageCategory(TIM_TALKWRAPRTP);

RTPSendPacket(rtpdata, 33, rtp_dest_addr);

}

}

ChangeProcUsageCategory(TIM_IDLE);

if ((len = get_udp_data(rtpSocket)) > 0)

{

ChangeProcUsageCategory(TIM_TALKUNWRAPRTP);

DecodeRTPPacket(rtpbuffer, len, rtpdata, &wavedatalen, &payloadtype);

ChangeProcUsageCategory(TIM_TALKDECODE);

if (payloadtype == CODEC_G711)

G711Decode(rtpdata, &pbsamples);

else if (payloadtype == CODEC_GSM)

GSMDecode(rtpdata, &pbsamples);

ChangeProcUsageCategory(TIM_TALKPLAYBACK);

for (i=0;i<number_of_samples;i++)

XAC97_mSetInFifoData(pbsamples[i]);

}

ChangeProcUsageCategory(TIM_IDLE);

do_kbinput();

do_dhcp();

VUPeak();

Page 73: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

A.3. FUNCTIONS 57

if (_now != NowSeconds())

{

_now = NowSeconds();

UpdateProcUsageStat();

Tekenscherm();

}

}

}

A.3 Functions

main.cvoid OnInternetConnectionUp(void);

This function is called when dhcp has succesfully retrieved ip

addresses.void OnCallHangup(void);

Event handler when the contact has hang up the phone.

int main(void);

Main function.

sip.cvoid SIPInit(void);

Initialization of SIP parameters, i.e., SIP uri, SIP server DNS

lookup.

void SDPInit(char *sdp);Initialization of SDP. IP address, port number and used codec are

filled in.void DoSIPRegistration(void);

Process handler for SIP registration and unregistration. Sends

(and resends if necessary) registration commands. Keeps track of

expiration time of the registration and starts reregistering at half

the expiration time.

void DoSIPConnection(void);Process handler for SIP Connection. Processes received SIP messages

and handles responses.

void PrepareSIPCommand(char *buffer, char *command, char *vias, char *sip address, char

*from, char *from tag, char *to, char *to tag, char *callid,

char *cseq, char *route, char *contact, int content length, char

*custom);Uses all information given to construct a SIP header for a SIP request

command.void PrepareSIPResponse(char *buffer, int response code, char *response text, char

*vias, char *from, char *from tag, char *to, char *to tag,

char *callid, char *cseq, char *route, char *contact, int

content length, char *custom);Uses all information given to construct a SIP header for a SIP

response.

void PrepareSIPHeader(char *buffer, char *vias, char *from, char *from tag, char

*to, char *to tag, char *callid, char *cseq, char *route, char

*contact, int content length, char *custom);Header constructor for general fields, used by PrepareSIPCommand() and

PrepareSIPResponse().

void SIPRegister(void);

Command for SIP registering.

void SIPUnregister(void);

Page 74: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

58 APPENDIX A. SOFTWARE DESCRIPTION

Command for SIP unregistering.

void SIPReregister(void);

Command for SIP reregistering.

void SIPAcceptCall(void);When an invitation is received, this function commands to accept the

invitation.void SIPDeclineCall(void);

When an invitation is received, this function commands to decline the

invitation.void SIPInviteFriend(void);

Invite friend.

void SIPCancelCall(void);

When inviting a friend, the invitation is canceled.

void SIPHangUp(void);

Hang up the phone.

void IncCSeq(void);

Increments the counter for the CSeq field in the SIP header with one.

void SIPSendRegister(void);

Construct and send the actual REGISTER request.

void SIPSendUnregister(void);Construct and send the REGISTER request, with expiration time 0, and

the SIP server will discard all bindings.

void SIPSendINVITE(void);

Construct and send the INVITE request.

void SIPSendACK(void);

Construct and send the ACK request to friend.

void SIPSendINVITEOK(void);Construct and send the 200 OK response in response to an INVITE

request.

void SIPSendINVITEDecline(void);Construct and send the Decline response in response to an INVITE

request.

void SIPSendCANCEL(void);

Construct and send the CANCEL request.

void SIPSendBYE(void);

Construct and send the BYE request.

void SIPSendBYEOK(void);

Construct and send the 200 OK response in response to an BYE request.

void SIPSendRinging(void);

Construct and send response 180 Ringing response.

void GenerateRandomBranch(unsigned char *branch);Generates a random string of 16 characters from the set [0-9A-Za-z]

for the Via field.void ParseREGISTEROK(char *message, int len, unsigned int *expires);

Extract the relevant information (Expires value) from the OK message

in response to our REGISTER request

void ParseSIP INVITE(char *message, int len, unsigned char *addr, unsigned short

*port);Extract the relevant information (Via, Call-ID, CSeq, Contact, From

fields) from the INVITE message.

void ParseSIP INVITEOK(char *message, int len, unsigned char *addr, unsigned short

*port, char *cseq);Extract the relevant information (Via, Call-ID, CSeq, Contact, To

fields) from the OK response to our INVITE request.

void ParseSDP(char *signal, int len, unsigned char *addr, unsigned short *port);

Page 75: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

A.3. FUNCTIONS 59

Extract information on the IP address of our contact, port number to

use and which codecs are supported.

void ParseSIP BYE(char *message, int len);Extract the relevant information (CSeq and Via fields) from the BYE

request we received

void ParseAuthentication(char *message, int len);If for any request a 401 Unauthorized response is received, this

function is called and the response is calculated. The response is

stored in the global variable ’response’.

int GetFieldParameter(char *SIPfield, char *paramname, char *fieldcont);Some fields can contain several parameters, for example, the

WWW-Authenticate field contains a realm and nonce parameter and the

Contact field contains the expires parameters. In fieldcont, the

parameter paramname is returned

int GetHeaderFieldNameAndContents(char *message, char **fieldname, char **fieldcont,

int *linelength, int *nextline);Searches the next field from the beginning of ’message’ and returns

the fieldname and contents. In linelength the number of charcters in

the current line are returned. This is needed to detect if the end

of the header is reached, because header and body are separated by a

blank line. In nextline, the number of characters to the next line of

the header is returned. This number of characters must be advanced to

get to the next line.

dhcp.cint DHCPInit(void);

Initializes DHCP. Creates a socket and initialized the DHCP state.

int do dhcp(void);Process handler for DHCP. State machine for the DHCP state. Send and

resends requests and processes received messages.

int ParseOptions(unsigned char *options, unsigned char search opt no, int *opt ptr,

unsigned char *opt len);Extracts the returned options in the options list.

ac97.cvoid InitAC97(void);

Initialized the AC97 sound codec. Sets the right volumes and the

sample frequency to 8 kHz.

void SetSoundInput(int sel);

Select an sound input: Microphone or Line in.

void SetSoundInputVolume(int vol);

Set the sound input volume.

void SetSoundOutputVolume(int vol);

Set the sound output volume.

void SetSoundInVU(int vu);

Changes the input VU meter level.

void SetSoundInPeak(int peak);

Changes the input peak indicators level.

void SetSoundOutVU(int vu);

Changes the output VU meter level.

void SetSoundOutPeak(int peak);

Changes the output peak indicators level.

void VUPeak(void);Function to be called regularly. Aften a certain time. The peak

indicators are reset.

Page 76: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

60 APPENDIX A. SOFTWARE DESCRIPTION

digcalc.cvoid DigestCalcResponse(char *pszUserName, char *pszRealm, char *pszPassword, char

*pszNonce, char *pszMethod, char *pszDigestUri, HASHHEX

Response);Calculates the MD5 response with the given username, realm, password,

nonce, method and URI.

dns.cint DNSInit(void);

Initializes DNS. Creates a socket for DNS.

int DNSLookup(char *servername, unsigned char *ip);

Looks up servername at the DNS server and returns ip address.

g711.cvoid G711Decode(Xuint8 *input, Xuint16 *output, int buflen);

Decodes ’buflen’ 16-bit PCM samples in buffer ’input’ to 8-bit aLaw

compressed values.

Xuint16 DecodeG711Sample(Xuint8 input);

Converts one 8-bit aLaw value to a 16-bit PCM sample.

void G711Encode(Xuint16 *input, Xuint8 *output, int buflen);Encodes ’buflen’ 8-bit aLaw compressed values in buffer ’input’ to

16-bit PCM samples.

Xuint8 EncodeG711Sample(Xuint16 input);

Converts one 16-bit PCM sample to a 8-bit compressed aLaw value.

gsm int.cvoid GSMInit(void);

Initializes the GSM encoder/decoder.

void GSMEncode(Xint16 *pcm, Xuint8 *gsm frame);Encodes 160 16-bit PCM samples in array *pcm to compressed GSM data in

*gsm frame.

void GSMDecode(Xuint8 *gsm frame, Xint16 *pcm);Decodes compressed GSM in array *gsm frame to 160 16-bit PCM samples

in *pcm.

icmp.cint xilnet icmp(unsigned char *, int);

Dispatches a received icmp packet.

void xilnet icmp echo reply(unsigned char *, unsigned int);An icmp ping request is received. This function sends a icmp ping

reply.

void xilnet icmp echo request(unsigned char *ip addr, unsigned char *buf, unsigned int

len);Sends a icmp echo (ping) request and starts a timer for measuring the

round trip delay.

internet.cvoid InitInternet(void);

Initializes Ethernet adapter and creates sockets for RTP, RTCP and

SIP. Sets default IP address and other addresses and if appropriate.

int init udp socket(struct sockaddr in *addr, unsigned char *buf);

Page 77: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

A.3. FUNCTIONS 61

Creates a UDP socket for address addr and assigns buffer *buf to the

socket. If a packet is received on this socket, the data is copied to

the buffer.void recv udp(void);

Process handler for received ethernet data. Polls if any data is

received and calls the right handler to eventually put the data in the

right buffer.

int get udp data(int s, struct sockaddr* from, unsigned int *fromlen);Test if data was received on socket s. If data was received, the

function results the length of the data.

io.cvoid InitIO(void);

Initializations for VGA output and PS/2 input.

void SetChar(int x, int y, char c, unsigned char color);

Sets the character on x,y to char c, with color color.

void DrawString(int x, int y, char *text, unsigned char color);

Puts string ’text’ to the screen at x,y with color color.

int EditField(int fieldno, char *entered text, int max char);Handles caret and text input at line fieldno. entered text is a

text buffer with size max char which already exists when calling the

function. This is a blocking call and should be called if the user is

talking to a friend.

void DrawBanner(void);

Draws TU Delft logo and author’s name.

void DrawSettings(void);

Draws all settings on the VGA screen. Calls all Update... functions.

void ClearSettingsLine(int l);

Fill a line of setting with black chars.

void UpdateTime(void);

Update the system time on the screen.

void UpdateLinkInfo(void);

Update all link info on the screen.

void UpdateCodecTimeInfo(void);

Update all measured codec process time info on the screen.

void UpdateProcUseInfo(void);

Update the processor usage info on the screen.

void UpdateVUMeters(void);Update the input and output VU meters and peak detectors on the

screen.void Tekenscherm(void);

Calls DrawSettings() and DrawBanner().

void do io(void);

Process handler for handling pressed keys for editing options.

void att to db(int att, int *db1, int *db2);Translates an attenuation factor to a dB value. db1 returns the

integer value and db2 returns the decimals after the decimal point.

lib.cvoid WriteToGPOutput(Xuint32 BaseAddress, int gpio width);

Sets the appropriate general io ports to inputs cq. outputs. Shows a

little disco for testing.

Xuint32 ReadFromGPInput(Xuint32 BaseAddress);

Reads the current value of the general io ports.

Page 78: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

62 APPENDIX A. SOFTWARE DESCRIPTION

void StrToInt(char *buffer, int *i);

Converts a string to an integer.

void PrintDebugMessage(char *mes, int len);

Prints a message on the RS232 port.

ntp.cint NTPInit(void);

Initializes NTP. Creates a socket for NTP.

unsigned int NTPGetTime(void);Sends a request to the time server and the response is returned by the

function.

ps2.cvoid init ps2(void);

Initializes PS/2 communication. Resets the PS/2 buffer.

void ps2 recv(void);

Process handler for PS/2 received data.

int ps2readbuf(unsigned short *c);Reads a key from the PS/2 buffer. If the function result is other

than 0, c holds the received key.

void ps2writebuf(unsigned short c);

Writes a key to the PS/2 buffer.

rtp.cvoid RTPStartSession(void);

Initializes variables when a connection is established and an RTP

session is setup.

void RTPSendPacket(unsigned char *rtp data, int len, struct sockaddr in *sock addr);Sends an RTP packet with payload ’rtp data’ and length ’len’ on socket

’sock addr’. Every 200 RTP packets, an RTCP packet is sent with

current connection statistics.void DecodeRTPPacket(unsigned char *rtp packet buf, int len, unsigned char *rtpdata, int

*rtpdatalen);When a RTP packet is received, this function is called from the

main function. ’rtp packet buf’ is a pointer to the complete RTP

packet. ’rtpdata’ is a pointer to a buffer that this function must

copy the payload data to. This function also checks if an RTCP

packet is received an processes this data by updating the variables

’outgoing jitter’ and ’outgoing packetloss’.

stun.cint StunInit(void);

Initializes STUN. Creates a socket for STUN.

int StunRequest(unsigned short socket nr, unsigned short *public source port);Sends a STUN request to the STUN server that is stored in variable

’stunservername’. When no reply message is received after 8 seconds,

Stun Request returns -1 as an error message. On succes the function

returns 0.

systime.cvoid SystemTimeInit(void);

Initializes the system time. Calls NTP to synchronize the time.

Page 79: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

A.3. FUNCTIONS 63

XTime Now(void);

Returns the current time in 100th of microseconds.

int GetDateTime(int *year, int *month, int *day, int *hour, int *min, int *sec);Converts the current time to the date and time values for displaying

purposes.

int NowSeconds(void);

Converts and returns the Now in seconds.

timing.cvoid InitTimers(void);

Initializes and resets registers of the timer module.

void ResetProcUsageCounters(void);

Resets the timing values for all processes.

void StartLoadBalanceTimer(void);

Starts the processor usage timer.

void ChangeProcUsageCategory(int cat);

Stop timing the current process and start timing a new process.

Xuint32 StartMiscTimer(int timer no);

Starts a miscalleneous timer 0 or 1.

Xuint32 StopMiscTimer(int timer no);

Stops a miscalleneous timer 0 or 1 and returns the current value.

void UpdateProcUsageStat(void);

Update the values for the processor usage.

udp.cint xilnet udp(unsigned char*, int);

Handles a received UDP message.

void xilnet udp init conns(void);

Initializes UDP connections.

int xilnet udp open conn(unsigned short);

Opens a UDP socket.

void xilnet udp header(struct xilnet udp conn*, unsigned char*, int);

Fills the udp header for the packet to be sent.

unsigned short xilnet udp tcp calc chksum(unsigned char*, int, unsigned char*, unsigned

char*, unsigned short);Calculates a checksum for the UDP message.

xilsock.cint xilsock init(void);

Initializes usage of sockets.

int xilsock socket(int s, int, int, unsigned char *);

Creates a socket.

int xilsock bind(int s, struct sockaddr*, int);

Binds a socket to a address and port.

int xilsock listen(int s, int);

Enables listening on a certain socket.

int xilsock sendto(int s, unsigned char*, unsigned int, struct sockaddr* to, unsigned

int tolen, unsigned char *hw addr);Sends an UDP packet on socket ’s’ to destination ’to’.

Page 80: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

64 APPENDIX A. SOFTWARE DESCRIPTION

arp.cint xilnet arp(unsigned char*, int);

Handles a received ARP message.

void xilnet arp reply(unsigned char*, int);

An ARP request has been received. This function sends a ARP reply so

anyone on the network will knows our hardware address related to IP

address so we can be found on the Internet.

void xilnet arp request(unsigned char *ip);

Sends an ARP requst on the local network to discover the network

address of the system with IP address ip.

eth.cint xilnet eth recv frame(void);

Processes the received ethernet frame and dispathes the data to either

IP or ARP.

int xilnet eth send frame(unsigned char *, int, unsigned char*, void*, unsigned short);

Sends a ethernet frame to hardware address hw addr. If this argument

is NULL, the IP address is used.

void xilnet eth update hw tbl(unsigned char *, int);

Update the hardware address table.

void xilnet add hw tbl entry(unsigned char *, unsigned char *);

Add an entry to the hardware address table.

int xilnet eth get hw addr(unsigned char *);

Looks up an IP address in the hardware address table and returns the

result. If the IP was not found in the table, an ARP request is sent.

The function waits for the result. On succes the hardware address is

returned and added to the table.

void xilnet eth init hw addr tbl(void);

Initializes the hardware address table.

int xilnet eth find old entry(void);

Finds the oldest entry in the hardware address table.

ip.cvoid xilnet ip init(unsigned char*);

Initializes IP.

int xilnet ip(unsigned char*, int);

Handles a received IP message.

void xilnet ip header(unsigned char*, int, int, unsigned char*);

Generates values for the IP header.

unsigned short xilnet ip calc chksum(unsigned char*, int);

Calculates the checksum for the IP header, given the data and header.

Page 81: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

A.3. FUNCTIONS 65

A.3.1 Linker script

_STACK_SIZE = DEFINED(_STACK_SIZE) ? _STACK_SIZE : 0x1FF0;

_HEAP_SIZE = DEFINED(_HEAP_SIZE) ? _HEAP_SIZE : 0x1FF0;

/* Define all the memory regions in the system */

MEMORY

{

plb_bram_if_cntlr_stacknheap : ORIGIN = 0xfff00000, LENGTH = 0x1ffff

plb_bram_if_cntlr_1 : ORIGIN = 0xffff0000, LENGTH = 0xffff - 4

boot : ORIGIN = 0xfffffffc, LENGTH = 4

}

PHDRS

{

ivector PT_LOAD ;

stack PT_LOAD ;

heap PT_LOAD ;

program PT_LOAD ;

data1 PT_LOAD ;

data2 PT_LOAD ;

boot0 PT_LOAD ;

boot PT_LOAD ;

}

/*

* Specify the default entry point to the program

*/

ENTRY(_boot)

STARTUP(boot.o)

/* GROUP(libxil.a libc.a) */

/*

* Define the sections, and where they are mapped in memory

*/

SECTIONS

{

.vectors :

{

*(.vectors)

} > plb_bram_if_cntlr_1 : ivector

.bss_stack :

{

/* add stack and align to 16 byte boundary */

. = . + _STACK_SIZE;

. = ALIGN(16);

__stack = .;

} > plb_bram_if_cntlr_1 : stack

.bss_heap :

{

/* add heap and align to 16 byte boundary */

. = ALIGN(16);

_heap_start = .;

. = . + _HEAP_SIZE;

. = ALIGN(16);

_heap_end = .;

} > plb_bram_if_cntlr_1 : heap

Page 82: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

66 APPENDIX A. SOFTWARE DESCRIPTION

/*

* This section is only needed if you have

* interrupts or exceptions

* Ensure that this is on a 64K boundary

*/

.text :

{

*(.text)

} > plb_bram_if_cntlr_stacknheap : program

_etext = .;

PROVIDE (etext = .);

PROVIDE (__etext = .);

.rodata :

{

*(.rodata)

} > plb_bram_if_cntlr_1 : data1

.sdata2 : { *(.sdata2) } > plb_bram_if_cntlr_1

__SDATA2_START__ = ADDR(.sdata2);

__SDATA2_END__ = ADDR(.sdata2) + SIZEOF(.sdata2);

.sbss2 : { *(.sbss2) } > plb_bram_if_cntlr_1

__SBSS2_START__ = ADDR(.sbss2);

__SBSS2_END__ = ADDR(.sbss2) + SIZEOF(.sbss2);

.data :

{

*(.data)

} > plb_bram_if_cntlr_1 : data2

.fixup : { *(.fixup) } > plb_bram_if_cntlr_1

.got1 : { *(.got1) } > plb_bram_if_cntlr_1

.got2 : { *(.got2) } > plb_bram_if_cntlr_1

/* We want the small data sections together, so single-instruction offsets

can access them all, and initialized data all before uninitialized, so

we can shorten the on-disk segment size. */

.sdata : { *(.sdata) } > plb_bram_if_cntlr_1

__SDATA_START__ = ADDR(.sdata);

__SDATA_END__ = ADDR(.sdata) + SIZEOF(.sdata);

_edata = .;

PROVIDE (edata = .);

PROVIDE (__edata = .);

.sbss :

{

__sbss_start = .;

___sbss_start = .;

*(.sbss)

*(.scommon)

__sbss_end = .;

___sbss_end = .;

} > plb_bram_if_cntlr_1

.bss :

{

__bss_start = .;

___bss_start = .;

*(.bss)

*(COMMON)

Page 83: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

A.3. FUNCTIONS 67

. = ALIGN(4);

__bss_end = .;

} > plb_bram_if_cntlr_1

/* BOOT0 must be a memory whose address is within 24 bits of .boot */

/* Normally it is the same physical memory as .boot */

.boot0 : { *(.boot0)} > plb_bram_if_cntlr_1 : boot0

_end = . ;

end = .;

__end = .;

/* There needs to be some memory defined at this address */

.boot 0xFFFFFFFC : { *(.boot) } : boot

}

Page 84: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

68 APPENDIX A. SOFTWARE DESCRIPTION

Page 85: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

Hardware description BThis appendix presents the details on the hardware accelerators Cross correlation andDual Lattice Filter. Section B.1 describes the pseudo code of the FSM, synthesis reportand the simulation results of the Crosscorrelation. Section B.2 describes the synthesisreport and the simulation results of the Dual Lattice Filter IP Core.

B.1 Crosscorrelation

This section presents the FSM pseudo code, the most interesting part of the synthesisreport and the simulation results of the Cross Correlation IP Core.

B.1.1 FSM pseudocode

// input:

// dp[-120..-1] are past reconstructed residual samples

// d[0..39] are current residual samples

//

// output:

// power: L2 norm of the dp signal

// max_index : index of the maximum of the crosscorrelation function

// max_corr : value of the maximum of the crosscorrelation function

// where 0 corresponds to a lag of -120 and 80 to a lag of -40.

start:

if start = 0 then

goto start;

max_corr_index := 0;

max_corr := MIN_INTEGER;

power := 0;

for corr1_index:=0 to 80 step 2 do

begin

corr2_index := corr1_index + 1;

corr1 := 0;

corr2 := 0;

for j:=0 to 39 do

begin

corr1 := corr1 + dp[corr1_index-120+j] * d[j];

if (corr2_index <> 81) then

corr2 := corr2 + dp[corr2_index-120+j] * d[j];

else

power := power + d[j] * d[j];

end;

if (corr1 > max_corr) then

begin

max_corr_index := corr1_index;

max_corr := corr1;

end;

if (corr2_index <> 81) then

begin

69

Page 86: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

70 APPENDIX B. HARDWARE DESCRIPTION

if (corr2 > max_corr) then

begin

max_corr_index := corr2_index;

max_corr := corr2;

end;

end;

end;

goto start;

B.1.2 Synthesis report

=========================================================================

* Advanced HDL Synthesis *

=========================================================================

Advanced RAM inference ...

Advanced multiplier inference ...

Found registered multiplier on the signal <_n0076> with 1 register level(s).

Found registered multiplier on the signal <_n0077> with 1 register level(s).

Advanced Registered AddSub inference ...

Selecting encoding for FSM_0 ...

Optimizing FSM <FSM_0> on signal <current_state> with one-hot encoding.

Dynamic shift register inference ...

=========================================================================

HDL Synthesis Report

Macro Statistics

# FSMs : 1

# Block RAMs : 1

256x32-bit dual-port block RAM : 1

# Multipliers : 2

16x16-bit registered multiplier : 2

# Adders/Subtractors : 10

34-bit adder : 4

8-bit adder : 6

# Registers : 87

1-bit register : 65

3-bit register : 2

32-bit register : 5

4-bit register : 1

8-bit register : 8

34-bit register : 2

16-bit register : 4

# Comparators : 3

34-bit comparator less : 1

34-bit comparator greater : 2

# Multiplexers : 7

32-bit 2-to-1 multiplexer : 1

1-bit 2-to-1 multiplexer : 5

8-bit 2-to-1 multiplexer : 1

# Xors : 4

1-bit xor2 : 4

=========================================================================

=========================================================================

* Final Report *

=========================================================================

Final Results

RTL Top Level Output File Name : ../implementation/...orr_0_wrapper/opb_crosscorr_0_wrapper.ngr

Top Level Output File Name : ../implementation/...orr_0_wrapper/opb_crosscorr_0_wrapper.ngc

Page 87: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

B.1. CROSSCORRELATION 71

Output Format : ngc

Optimization Goal : speed

Keep Hierarchy : no

Design Statistics

# IOs : 109

Macro Statistics :

# RAM : 1

# 256x32-bit dual-port block RAM: 1

# Registers : 72

# 1-bit register : 50

# 16-bit register : 4

# 3-bit register : 2

# 32-bit register : 5

# 34-bit register : 2

# 4-bit register : 1

# 8-bit register : 8

# Multiplexers : 7

# 2-to-1 multiplexer : 7

# Adders/Subtractors : 10

# 34-bit adder : 4

# 8-bit adder : 6

# Multipliers : 2

# 16x16-bit registered multiplier: 2

# Comparators : 3

# 34-bit comparator greater : 2

# 34-bit comparator less : 1

Cell Usage :

# BELS : 1510

# GND : 1

# LUT1 : 48

# LUT1_L : 2

# LUT2 : 186

# LUT2_D : 7

# LUT2_L : 162

# LUT3 : 127

# LUT3_D : 2

# LUT3_L : 15

# LUT4 : 382

# LUT4_D : 10

# LUT4_L : 42

# MUXCY : 288

# MUXF5 : 65

# VCC : 1

# XORCY : 172

# FlipFlops/Latches : 423

# FD : 74

# FDE : 16

# FDR : 232

# FDRE : 64

# FDRS : 36

# FDS : 1

# RAMS : 1

# RAMB16_S36_S36 : 1

# MULTs : 2

# MULT18X18S : 2

=========================================================================

Device utilization summary:

---------------------------

Selected Device : 2vp30ff896-7

Page 88: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

72 APPENDIX B. HARDWARE DESCRIPTION

Number of Slices: 524 out of 13696 3%

Number of Slice Flip Flops: 423 out of 27392 1%

Number of 4 input LUTs: 983 out of 27392 3%

Number of BRAMs: 1 out of 136 0%

Number of MULT18X18s: 2 out of 136 1%

=========================================================================

TIMING REPORT

NOTE: THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE.

FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

GENERATED AFTER PLACE-and-ROUTE.

Clock Information:

------------------

-----------------------------------+------------------------+-------+

Clock Signal | Clock buffer(FF name) | Load |

-----------------------------------+------------------------+-------+

OPB_Clk | NONE | 426 |

-----------------------------------+------------------------+-------+

Timing Summary:

---------------

Speed Grade: -7

Minimum period: 8.653ns (Maximum Frequency: 115.574MHz)

Minimum input arrival time before clock: 1.763ns

Maximum output required time after clock: 1.135ns

Maximum combinational path delay: 0.275ns

Timing Detail:

--------------

All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------

Timing constraint: Default period analysis for Clock ’OPB_Clk’

Delay: 8.653ns (Levels of Logic = 15)

Source: opb_crosscorr_0/OPB_IPIF_I/OPB_BAM_I/opb_abus_s0_0 (FF)

Destination: opb_crosscorr_0/OPB_IPIF_I/OPB_BAM_I/sln_dbus_s2_30 (FF)

Source Clock: OPB_Clk rising

Destination Clock: OPB_Clk rising

Data Path: opb_crosscorr_0/...PB_BAM_I/opb_abus_s0_0 to opb_crosscorr_0/...B_BAM_I/sln_dbus_s2_30

Gate Net

Cell:in->out fanout Delay Delay Logical Name (Net Name)

---------------------------------------- ------------

FD:C->Q 1 0.370 0.360 opb_crosscorr_0/OPB_IPIF_I/OPB_BAM_I/opb_abus_s0_0...

LUT4_L:I0->LO 1 0.275 0.000 opb_crosscorr_0/OPB_IPIF_I/OPB_BAM_I/DEVICESEL_S0_...

MUXCY:S->O 1 0.334 0.000 opb_crosscorr_0/OPB_IPIF_I/OPB_BAM_I/DEVICESEL_S0_...

MUXCY:CI->O 1 0.036 0.000 opb_crosscorr_0/OPB_IPIF_I/OPB_BAM_I/DEVICESEL_S0_...

MUXCY:CI->O 1 0.036 0.000 opb_crosscorr_0/OPB_IPIF_I/OPB_BAM_I/DEVICESEL_S0_...

MUXCY:CI->O 1 0.036 0.000 opb_crosscorr_0/OPB_IPIF_I/OPB_BAM_I/DEVICESEL_S0_...

MUXCY:CI->O 3 0.036 0.000 opb_crosscorr_0/OPB_IPIF_I/OPB_BAM_I/DEVICESEL_S0_...

MUXCY:CI->O 1 0.036 0.000 opb_crosscorr_0/OPB_IPIF_I/OPB_BAM_I/CS_I0/MUXCY_I...

MUXCY:CI->O 7 0.600 0.540 opb_crosscorr_0/OPB_IPIF_I/OPB_BAM_I/CS_I0/MUXCY_I...

LUT4:I0->O 16 0.275 0.700 opb_crosscorr_0/OPB_IPIF_I/OPB_BAM_I/_n0102 (opb_c...

MUXCY:CI->O 4 0.416 0.500 opb_crosscorr_0/OPB_IPIF_I/OPB_BAM_I/CE_I03/MUXCY_...

LUT2_D:I0->O 7 0.275 0.540 opb_crosscorr_0/OPB_IPIF_I/OPB_BAM_I/_n01091 (opb_...

LUT4:I1->O 1 0.275 0.360 opb_crosscorr_0/OPB_IPIF_I/OPB_BAM_I/Mmux_ipic_xfe...

LUT4_L:I1->LO 1 0.275 0.100 opb_crosscorr_0/OPB_IPIF_I/OPB_BAM_I/Mmux_ipic_xfe...

LUT4:I3->O 3 0.275 0.490 opb_crosscorr_0/OPB_IPIF_I/OPB_BAM_I/Mmux_ipic_xfe...

LUT3:I2->O 16 0.275 0.700 opb_crosscorr_0/OPB_IPIF_I/OPB_BAM_I/_n00271 (opb_...

Page 89: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

B.1. CROSSCORRELATION 73

FDR:R 0.536 opb_crosscorr_0/OPB_IPIF_I/OPB_BAM_I/sln_dbus_s2_16

----------------------------------------

Total 8.653ns (4.363ns logic, 4.290ns route)

(50.4% logic, 49.6% route)

-------------------------------------------------------------------------

Timing constraint: Default OFFSET IN BEFORE for Clock ’OPB_Clk’

Offset: 1.763ns (Levels of Logic = 1)

Source: OPB_Rst (PAD)

Destination: opb_crosscorr_0/USER_LOGIC_I/ar_read_ack_dly1 (FF)

Destination Clock: OPB_Clk rising

Data Path: OPB_Rst to opb_crosscorr_0/USER_LOGIC_I/ar_read_ack_dly1

Gate Net

Cell:in->out fanout Delay Delay Logical Name (Net Name)

---------------------------------------- ------------

LUT2:I1->O 292 0.275 0.952 opb_crosscorr_0/OPB_IPIF_I/OPB_BAM_I/RESET_MIR_I0/...

FDRE:R 0.536 opb_crosscorr_0/USER_LOGIC_I/slv_reg5_0

----------------------------------------

Total 1.763ns (0.811ns logic, 0.952ns route)

(46.0% logic, 54.0% route)

-------------------------------------------------------------------------

Timing constraint: Default OFFSET OUT AFTER for Clock ’OPB_Clk’

Offset: 1.135ns (Levels of Logic = 1)

Source: opb_crosscorr_0/OPB_IPIF_I/OPB_BAM_I/sln_xferack_s2 (FF)

Destination: Sl_xferAck (PAD)

Source Clock: OPB_Clk rising

Data Path: opb_crosscorr_0/OPB_IPIF_I/OPB_BAM_I/sln_xferack_s2 to Sl_xferAck

Gate Net

Cell:in->out fanout Delay Delay Logical Name (Net Name)

---------------------------------------- ------------

FDR:C->Q 3 0.370 0.490 opb_crosscorr_0/OPB_IPIF_I/OPB_BAM_I/sln_xferack_s...

LUT2:I0->O 0 0.275 0.000 opb_crosscorr_0/OPB_IPIF_I/OPB_BAM_I/Sln_xferAck1

----------------------------------------

Total 1.135ns (0.645ns logic, 0.490ns route)

(56.8% logic, 43.2% route)

-------------------------------------------------------------------------

Timing constraint: Default path analysis

Delay: 0.275ns (Levels of Logic = 1)

Source: OPB_select (PAD)

Destination: Sl_xferAck (PAD)

Data Path: OPB_select to Sl_xferAck

Gate Net

Cell:in->out fanout Delay Delay Logical Name (Net Name)

---------------------------------------- ------------

LUT2:I1->O 0 0.275 0.000 opb_crosscorr_0/OPB_IPIF_I/OPB_BAM_I/Sln_errAck1

----------------------------------------

Total 0.275ns (0.275ns logic, 0.000ns route)

(100.0% logic, 0.0% route)

=========================================================================

B.1.3 Simulation

Page 90: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

74 APPENDIX B. HARDWARE DESCRIPTION

00

C10

0000

0

0000

0000

0289

EC

34

8000

0000

0403

FE

0906

58C

9AA

000F

14

040

us

80 u

s

/cro

ssco

rr_f

sm2_

tb/c

lk

/cro

ssco

rr_f

sm2_

tb/r

eset

/cro

ssco

rr_f

sm2_

tb/s

tart

/cro

ssco

rr_f

sm2_

tb/r

esta

rt

/cro

ssco

rr_f

sm2_

tb/j_

in00

/cro

ssco

rr_f

sm2_

tb/d

ata_

from

_mem

C10

0000

0

/cro

ssco

rr_f

sm2_

tb/p

ower

0000

0000

0289

EC

34

/cro

ssco

rr_f

sm2_

tb/m

ax_c

orr

8000

0000

0403

FE

0906

58C

9AA

/cro

ssco

rr_f

sm2_

tb/m

ax_c

orr_

inde

x00

0F

/cro

ssco

rr_f

sm2_

tb/r

ead_

addr

ess

14

/cro

ssco

rr_f

sm2_

tb/fi

nish

ed

Ent

ity:c

ross

corr

_fsm

2_tb

Arc

hite

ctur

e:be

nch

Dat

e: T

ue M

ar 0

7 3:

26:5

9 P

M W

. Eur

ope

Sta

ndar

d T

ime

2006

R

ow: 1

Pag

e: 1

Page 91: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

B.1. CROSSCORRELATION 75

00

C10

0000

0

0000

0000

0289

EC

34

6737

0505

1064

8209

0

015

14 -260

-260

-358

-358

6760

0

1281

64

2230

1663

4259

3332

02

46

810

1214

1618

2022

2426

2830

3234

3638

4042

4446

4850

5254

5658

6062

6466

6870

7274

7678

80

13

57

911

1315

1719

2123

2527

2931

3335

3739

4143

4547

4951

5355

5759

6163

6567

6971

7375

7779

81

20 0

0 2021

2223

2425

2627

2829

3031

3233

3435

3637

3839

4041

4243

4445

4647

4849

5051

5253

5455

5657

5859

6061

040

us

80 u

s

/cro

ssco

rr_f

sm2_

tb/c

lk

/cro

ssco

rr_f

sm2_

tb/r

eset

/cro

ssco

rr_f

sm2_

tb/s

tart

/cro

ssco

rr_f

sm2_

tb/r

esta

rt

/cro

ssco

rr_f

sm2_

tb/j_

in00

/cro

ssco

rr_f

sm2_

tb/d

ata_

from

_mem

C10

0000

0

/cro

ssco

rr_f

sm2_

tb/p

ower

0000

0000

0289

EC

34

/cro

ssco

rr_f

sm2_

tb/m

ax_c

orr

6737

0505

1064

8209

0

/cro

ssco

rr_f

sm2_

tb/m

ax_c

orr_

inde

x0

15

/cro

ssco

rr_f

sm2_

tb/r

ead_

addr

ess

14

/cro

ssco

rr_f

sm2_

tb/fi

nish

ed

/cro

ssco

rr_f

sm2_

tb/u

ut/o

p1a_

cmb

-260

/cro

ssco

rr_f

sm2_

tb/u

ut/o

p1b_

cmb

-260

/cro

ssco

rr_f

sm2_

tb/u

ut/o

p2a_

cmb

-358

/cro

ssco

rr_f

sm2_

tb/u

ut/o

p2b_

cmb

-358

/cro

ssco

rr_f

sm2_

tb/u

ut/p

rod1

_sig

6760

0

/cro

ssco

rr_f

sm2_

tb/u

ut/p

rod2

_sig

1281

64

/cro

ssco

rr_f

sm2_

tb/u

ut/c

orr1

_cm

b22

3016

63

/cro

ssco

rr_f

sm2_

tb/u

ut/c

orr2

_cm

b42

5933

32

/cro

ssco

rr_f

sm2_

tb/u

ut/c

orr1

_i_c

mb

02

46

810

1214

1618

2022

2426

2830

3234

3638

4042

4446

4850

5254

5658

6062

6466

6870

7274

7678

80

/cro

ssco

rr_f

sm2_

tb/u

ut/c

orr2

_i_c

mb

13

57

911

1315

1719

2123

2527

2931

3335

3739

4143

4547

4951

5355

5759

6163

6567

6971

7375

7779

81

/cro

ssco

rr_f

sm2_

tb/u

ut/d

_add

ress

_cm

b20

/cro

ssco

rr_f

sm2_

tb/u

ut/d

p_ad

dres

s_cm

b0

/cro

ssco

rr_f

sm2_

tb/u

ut/d

_sta

rt_a

ddre

ss_c

mb

0

/cro

ssco

rr_f

sm2_

tb/u

ut/d

p_st

art_

addr

ess_

cmb

2021

2223

2425

2627

2829

3031

3233

3435

3637

3839

4041

4243

4445

4647

4849

5051

5253

5455

5657

5859

6061

Ent

ity:c

ross

corr

_fsm

2_tb

Arc

hite

ctur

e:be

nch

Dat

e: T

ue M

ar 0

7 3:

42:5

2 P

M W

. Eur

ope

Sta

ndar

d T

ime

2006

R

ow: 1

Pag

e: 1

Page 92: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

76 APPENDIX B. HARDWARE DESCRIPTION

00

F80

9F02

DC

1000

000

0000

E50

0F

9BF

0269

0000

E50

000

0000

0001

0B03

9500

0000

001A

FF

0000

00F

C04

E4

1AF

F00

00

0000

0000

X0

-214

7483

648

X0

X0

2021

121

222

2223

323

0000

F80

9F

9BF

010B

00F

C

0000

C10

000

00E

500

0000

1AF

F

0000

F02

D02

6903

9504

E4

0000

E50

000

001A

FF

0000

X0

3288

4992

011

0661

12-1

8455

040

X0

-426

4704

063

3738

786

5257

2

032

8849

9228

6202

88

011

0661

1217

4034

99

0 1 01

23

2021

2223

0 20 010

020

030

040

0

/cro

ssco

rr_f

sm2_

tb/c

lk

/cro

ssco

rr_f

sm2_

tb/r

eset

/cro

ssco

rr_f

sm2_

tb/s

tart

/cro

ssco

rr_f

sm2_

tb/r

esta

rt

/cro

ssco

rr_f

sm2_

tb/j_

in00

/cro

ssco

rr_f

sm2_

tb/d

ata_

from

_mem

F80

9F02

DC

1000

000

0000

E50

0F

9BF

0269

0000

E50

000

0000

0001

0B03

9500

0000

001A

FF

0000

00F

C04

E4

1AF

F00

00

/cro

ssco

rr_f

sm2_

tb/p

ower

0000

0000

/cro

ssco

rr_f

sm2_

tb/m

ax_c

orr

X0

-214

7483

648

/cro

ssco

rr_f

sm2_

tb/m

ax_c

orr_

inde

xX

0

/cro

ssco

rr_f

sm2_

tb/r

ead_

addr

ess

X0

2021

121

222

2223

323

/cro

ssco

rr_f

sm2_

tb/fi

nish

ed

/cro

ssco

rr_f

sm2_

tb/u

ut/o

p1a_

cmb

0000

F80

9F

9BF

010B

00F

C

/cro

ssco

rr_f

sm2_

tb/u

ut/o

p1b_

cmb

0000

C10

000

00E

500

0000

1AF

F

/cro

ssco

rr_f

sm2_

tb/u

ut/o

p2a_

cmb

0000

F02

D02

6903

9504

E4

/cro

ssco

rr_f

sm2_

tb/u

ut/o

p2b_

cmb

0000

E50

000

001A

FF

0000

/cro

ssco

rr_f

sm2_

tb/u

ut/p

rod1

_sig

X0

3288

4992

011

0661

12-1

8455

040

/cro

ssco

rr_f

sm2_

tb/u

ut/p

rod2

_sig

X0

-426

4704

063

3738

786

5257

2

/cro

ssco

rr_f

sm2_

tb/u

ut/c

orr1

_cm

b0

3288

4992

2862

0288

/cro

ssco

rr_f

sm2_

tb/u

ut/c

orr2

_cm

b0

1106

6112

1740

3499

/cro

ssco

rr_f

sm2_

tb/u

ut/c

orr1

_i_c

mb

0

/cro

ssco

rr_f

sm2_

tb/u

ut/c

orr2

_i_c

mb

1

/cro

ssco

rr_f

sm2_

tb/u

ut/d

_add

ress

_cm

b0

12

3

/cro

ssco

rr_f

sm2_

tb/u

ut/d

p_ad

dres

s_cm

b20

2122

23

/cro

ssco

rr_f

sm2_

tb/u

ut/d

_sta

rt_a

ddre

ss_c

mb

0

/cro

ssco

rr_f

sm2_

tb/u

ut/d

p_st

art_

addr

ess_

cmb

20

Ent

ity:c

ross

corr

_fsm

2_tb

Arc

hite

ctur

e:be

nch

Dat

e: T

ue M

ar 0

7 4:

01:1

4 P

M W

. Eur

ope

Sta

ndar

d T

ime

2006

R

ow: 1

Pag

e: 1

Page 93: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

B.1. CROSSCORRELATION 77

00 0340

F84

607

BC

FF

61F

EF

CF

E9A

07B

CF

F61

0298

01E

CF

809F

02D

0000

E50

000

0000

00F

9BF

0269

0000

0000

1AF

F00

00

0000

0000

-214

7483

648

6737

0505

0 3839

1939

400

2122

122

23

FC

B6

FE

FC

F80

9F

9BF

F84

607

BC

FF

6100

00E

500

0000

0008

FE

9AF

02D

0269

F84

607

BC

FF

6102

98E

500

0000

1AF

F

-700

544

1665

476

5142

80-5

1480

041

340

3242

010

1409

3568

1106

6112

0

-158

2415

840

-708

840

5692

2-2

3771

2-2

6898

6428

0005

120

4264

087

6782

8383

6737

0505

028

0005

12

3075

5634

3243

6950

3224

0578

014

0935

6818

3576

55

02

13

1819

200

12

3940

2122

23

0 2021

2100

2200

2300

2400

/cro

ssco

rr_f

sm2_

tb/c

lk

/cro

ssco

rr_f

sm2_

tb/r

eset

/cro

ssco

rr_f

sm2_

tb/s

tart

/cro

ssco

rr_f

sm2_

tb/r

esta

rt

/cro

ssco

rr_f

sm2_

tb/j_

in00

/cro

ssco

rr_f

sm2_

tb/d

ata_

from

_mem

0340

F84

607

BC

FF

61F

EF

CF

E9A

07B

CF

F61

0298

01E

CF

809F

02D

0000

E50

000

0000

00F

9BF

0269

0000

0000

1AF

F00

00

/cro

ssco

rr_f

sm2_

tb/p

ower

0000

0000

/cro

ssco

rr_f

sm2_

tb/m

ax_c

orr

-214

7483

648

6737

0505

/cro

ssco

rr_f

sm2_

tb/m

ax_c

orr_

inde

x0

/cro

ssco

rr_f

sm2_

tb/r

ead_

addr

ess

3839

1939

400

2122

122

23

/cro

ssco

rr_f

sm2_

tb/fi

nish

ed

/cro

ssco

rr_f

sm2_

tb/u

ut/o

p1a_

cmb

FC

B6

FE

FC

F80

9F

9BF

/cro

ssco

rr_f

sm2_

tb/u

ut/o

p1b_

cmb

F84

607

BC

FF

6100

00E

500

0000

/cro

ssco

rr_f

sm2_

tb/u

ut/o

p2a_

cmb

0008

FE

9AF

02D

0269

/cro

ssco

rr_f

sm2_

tb/u

ut/o

p2b_

cmb

F84

607

BC

FF

6102

98E

500

0000

1AF

F

/cro

ssco

rr_f

sm2_

tb/u

ut/p

rod1

_sig

-700

544

1665

476

5142

80-5

1480

041

340

3242

010

1409

3568

1106

6112

0

/cro

ssco

rr_f

sm2_

tb/u

ut/p

rod2

_sig

-158

2415

840

-708

840

5692

2-2

3771

2-2

6898

6428

0005

120

4264

087

/cro

ssco

rr_f

sm2_

tb/u

ut/c

orr1

_cm

b67

8283

8367

3705

050

2800

0512

/cro

ssco

rr_f

sm2_

tb/u

ut/c

orr2

_cm

b30

7556

3432

4369

5032

2405

780

1409

3568

1835

7655

/cro

ssco

rr_f

sm2_

tb/u

ut/c

orr1

_i_c

mb

02

/cro

ssco

rr_f

sm2_

tb/u

ut/c

orr2

_i_c

mb

13

/cro

ssco

rr_f

sm2_

tb/u

ut/d

_add

ress

_cm

b18

1920

01

2

/cro

ssco

rr_f

sm2_

tb/u

ut/d

p_ad

dres

s_cm

b39

4021

2223

/cro

ssco

rr_f

sm2_

tb/u

ut/d

_sta

rt_a

ddre

ss_c

mb

0

/cro

ssco

rr_f

sm2_

tb/u

ut/d

p_st

art_

addr

ess_

cmb

2021

Ent

ity:c

ross

corr

_fsm

2_tb

Arc

hite

ctur

e:be

nch

Dat

e: T

ue M

ar 0

7 5:

02:5

2 P

M W

. Eur

ope

Sta

ndar

d T

ime

2006

R

ow: 1

Pag

e: 1

Page 94: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

78 APPENDIX B. HARDWARE DESCRIPTION

B.2 Dual Lattice Filter

This section presents the most interesting part of the synthesis report and the simulationresults of the Dual Lattice Filter IP Core.

B.2.1 Synthesis report

=========================================================================

* Advanced HDL Synthesis *

=========================================================================

Advanced RAM inference ...

Advanced multiplier inference ...

Found registered multiplier on the signal <_n0214> with 1 register level(s).

Found registered multiplier on the signal <_n0215> with 1 register level(s).

Advanced Registered AddSub inference ...

Selecting encoding for FSM_0 ...

Optimizing FSM <FSM_0> on signal <current_state> with one-hot encoding.

Dynamic shift register inference ...

=========================================================================

HDL Synthesis Report

Macro Statistics

# FSMs : 1

# Block RAMs : 2

256x32-bit dual-port block RAM : 2

# Multipliers : 2

16x16-bit registered multiplier : 2

# Adders/Subtractors : 11

8-bit adder : 1

16-bit subtractor : 1

16-bit adder : 9

# Registers : 159

16-bit register : 65

8-bit register : 1

32-bit register : 4

1-bit register : 86

3-bit register : 2

4-bit register : 1

# Latches : 1

16-bit latch : 1

# Multiplexers : 10

32-bit 2-to-1 multiplexer : 1

16-bit 2-to-1 multiplexer : 4

1-bit 2-to-1 multiplexer : 5

=========================================================================

=========================================================================

* Final Report *

=========================================================================

Final Results

RTL Top Level Output File Name : ../implementation/...0_wrapper/opb_dual_lat_filt_0_wrapper.ngr

Top Level Output File Name : ../implementation/...0_wrapper/opb_dual_lat_filt_0_wrapper.ngc

Output Format : ngc

Optimization Goal : speed

Keep Hierarchy : no

Design Statistics

# IOs : 109

Page 95: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

B.2. DUAL LATTICE FILTER 79

Macro Statistics :

# RAM : 2

# 256x32-bit dual-port block RAM: 2

# Registers : 95

# 1-bit register : 22

# 16-bit register : 65

# 3-bit register : 2

# 32-bit register : 4

# 4-bit register : 1

# 8-bit register : 1

# Multiplexers : 10

# 2-to-1 multiplexer : 10

# Adders/Subtractors : 11

# 16-bit adder : 9

# 16-bit subtractor : 1

# 8-bit adder : 1

# Multipliers : 2

# 16x16-bit registered multiplier: 2

Cell Usage :

# BELS : 2527

# GND : 1

# LUT1 : 18

# LUT1_D : 1

# LUT1_L : 10

# LUT2 : 523

# LUT2_D : 6

# LUT2_L : 32

# LUT3 : 170

# LUT3_D : 3

# LUT3_L : 13

# LUT4 : 1237

# LUT4_D : 41

# LUT4_L : 79

# MUXCY : 231

# MUXF5 : 3

# VCC : 1

# XORCY : 158

# FlipFlops/Latches : 1289

# FD : 90

# FDE : 512

# FDR : 132

# FDRE : 276

# FDRS : 214

# FDRSE : 48

# FDS : 1

# LD : 16

# RAMS : 2

# RAMB16_S36_S36 : 2

# MULTs : 2

# MULT18X18S : 2

=========================================================================

Device utilization summary:

---------------------------

Selected Device : 2vp30ff896-7

Number of Slices: 1194 out of 13696 8%

Number of Slice Flip Flops: 1289 out of 27392 4%

Number of 4 input LUTs: 2133 out of 27392 7%

Number of BRAMs: 2 out of 136 1%

Number of MULT18X18s: 2 out of 136 1%

Page 96: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

80 APPENDIX B. HARDWARE DESCRIPTION

=========================================================================

TIMING REPORT

NOTE: THESE TIMING NUMBERS ARE ONLY A SYNTHESIS ESTIMATE.

FOR ACCURATE TIMING INFORMATION PLEASE REFER TO THE TRACE REPORT

GENERATED AFTER PLACE-and-ROUTE.

Clock Information:

------------------

-----------------------------------+------------------------+-------+

Clock Signal | Clock buffer(FF name) | Load |

-----------------------------------+------------------------+-------+

OPB_Clk | NONE | 1277 |

opb_dual_lat_filt_0/USER_LOGIC_I/current_state_FFd34:Q| NONE | 16 |

-----------------------------------+------------------------+-------+

Timing Summary:

---------------

Speed Grade: -7

Minimum period: 8.466ns (Maximum Frequency: 118.117MHz)

Minimum input arrival time before clock: 3.865ns

Maximum output required time after clock: 1.155ns

Maximum combinational path delay: 0.275ns

Timing Detail:

--------------

All values displayed in nanoseconds (ns)

-------------------------------------------------------------------------

Timing constraint: Default period analysis for Clock ’OPB_Clk’

Delay: 8.466ns (Levels of Logic = 15)

Source: opb_dual_lat_filt_0/OPB_IPIF_I/OPB_BAM_I/opb_abus_s0_0 (FF)

Destination: opb_dual_lat_filt_0/OPB_IPIF_I/OPB_BAM_I/sln_dbus_s2_23 (FF)

Source Clock: OPB_Clk rising

Destination Clock: OPB_Clk rising

Data Path: opb_dual_lat_filt_0/...AM_I/opb_abus_s0_0 to opb_dual_lat_filt_0/...M_I/sln_dbus_s2_23

Gate Net

Cell:in->out fanout Delay Delay Logical Name (Net Name)

---------------------------------------- ------------

FD:C->Q 1 0.370 0.360 opb_dual_lat_filt_0/OPB_IPIF_I/OPB_BAM_I/opb_abus_...

LUT4_L:I0->LO 1 0.275 0.000 opb_dual_lat_filt_0/OPB_IPIF_I/OPB_BAM_I/DEVICESEL...

MUXCY:S->O 1 0.334 0.000 opb_dual_lat_filt_0/OPB_IPIF_I/OPB_BAM_I/DEVICESEL...

MUXCY:CI->O 1 0.036 0.000 opb_dual_lat_filt_0/OPB_IPIF_I/OPB_BAM_I/DEVICESEL...

MUXCY:CI->O 1 0.036 0.000 opb_dual_lat_filt_0/OPB_IPIF_I/OPB_BAM_I/DEVICESEL...

MUXCY:CI->O 1 0.036 0.000 opb_dual_lat_filt_0/OPB_IPIF_I/OPB_BAM_I/DEVICESEL...

MUXCY:CI->O 3 0.036 0.000 opb_dual_lat_filt_0/OPB_IPIF_I/OPB_BAM_I/DEVICESEL...

MUXCY:CI->O 1 0.036 0.000 opb_dual_lat_filt_0/OPB_IPIF_I/OPB_BAM_I/CS_I0/MUX...

MUXCY:CI->O 8 0.600 0.560 opb_dual_lat_filt_0/OPB_IPIF_I/OPB_BAM_I/CS_I0/MUX...

LUT4:I0->O 17 0.275 0.710 opb_dual_lat_filt_0/OPB_IPIF_I/OPB_BAM_I/_n0186 (o...

MUXCY:CI->O 1 0.036 0.000 opb_dual_lat_filt_0/OPB_IPIF_I/OPB_BAM_I/CE_I03/MU...

MUXCY:CI->O 38 0.416 0.812 opb_dual_lat_filt_0/OPB_IPIF_I/OPB_BAM_I/CE_I03/MU...

LUT3:I0->O 1 0.275 0.360 opb_dual_lat_filt_0/OPB_IPIF_I/OPB_BAM_I/Mmux_ipic...

LUT4:I3->O 1 0.275 0.360 opb_dual_lat_filt_0/OPB_IPIF_I/OPB_BAM_I/Mmux_ipic...

LUT4_D:I0->O 2 0.275 0.480 opb_dual_lat_filt_0/OPB_IPIF_I/OPB_BAM_I/Mmux_ipic...

LUT4:I0->O 16 0.275 0.700 opb_dual_lat_filt_0/OPB_IPIF_I/OPB_BAM_I/_n00551_1...

FDR:R 0.536 opb_dual_lat_filt_0/OPB_IPIF_I/OPB_BAM_I/sln_dbus_...

----------------------------------------

Total 8.466ns (4.124ns logic, 4.342ns route)

(48.7% logic, 51.3% route)

-------------------------------------------------------------------------

Page 97: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

B.2. DUAL LATTICE FILTER 81

Timing constraint: Default OFFSET IN BEFORE for Clock ’OPB_Clk’

Offset: 3.865ns (Levels of Logic = 4)

Source: OPB_Rst (PAD)

Destination: opb_dual_lat_filt_0/USER_LOGIC_I/refl1_0_to_12_15 (FF)

Destination Clock: OPB_Clk rising

Data Path: OPB_Rst to opb_dual_lat_filt_0/USER_LOGIC_I/refl1_0_to_12_15

Gate Net

Cell:in->out fanout Delay Delay Logical Name (Net Name)

---------------------------------------- ------------

LUT2:I1->O 317 0.275 0.952 opb_dual_lat_filt_0/OPB_IPIF_I/OPB_BAM_I/RESET_MIR...

LUT4_D:I2->O 3 0.275 0.490 opb_dual_lat_filt_0/USER_LOGIC_I/Ker521941 (opb_du...

LUT4_D:I1->O 1 0.275 0.360 opb_dual_lat_filt_0/USER_LOGIC_I/Ker524461_SW2 (N8...

LUT4:I2->O 16 0.275 0.700 opb_dual_lat_filt_0/USER_LOGIC_I/_n03551_1 (opb_du...

FDE:CE 0.263 opb_dual_lat_filt_0/USER_LOGIC_I/refl3_0_to_12_9

----------------------------------------

Total 3.865ns (1.363ns logic, 2.502ns route)

(35.3% logic, 64.7% route)

-------------------------------------------------------------------------

Timing constraint: Default OFFSET OUT AFTER for Clock ’OPB_Clk’

Offset: 1.155ns (Levels of Logic = 1)

Source: opb_dual_lat_filt_0/OPB_IPIF_I/OPB_BAM_I/sln_xferack_s2 (FF)

Destination: Sl_xferAck (PAD)

Source Clock: OPB_Clk rising

Data Path: opb_dual_lat_filt_0/OPB_IPIF_I/OPB_BAM_I/sln_xferack_s2 to Sl_xferAck

Gate Net

Cell:in->out fanout Delay Delay Logical Name (Net Name)

---------------------------------------- ------------

FDR:C->Q 5 0.370 0.510 opb_dual_lat_filt_0/OPB_IPIF_I/OPB_BAM_I/sln_xfera...

LUT2:I0->O 0 0.275 0.000 opb_dual_lat_filt_0/OPB_IPIF_I/OPB_BAM_I/Sln_xferA...

----------------------------------------

Total 1.155ns (0.645ns logic, 0.510ns route)

(55.8% logic, 44.2% route)

-------------------------------------------------------------------------

Timing constraint: Default path analysis

Delay: 0.275ns (Levels of Logic = 1)

Source: OPB_select (PAD)

Destination: Sl_xferAck (PAD)

Data Path: OPB_select to Sl_xferAck

Gate Net

Cell:in->out fanout Delay Delay Logical Name (Net Name)

---------------------------------------- ------------

LUT2:I1->O 0 0.275 0.000 opb_dual_lat_filt_0/OPB_IPIF_I/OPB_BAM_I/Sln_errAc...

----------------------------------------

Total 0.275ns (0.275ns logic, 0.000ns route)

(100.0% logic, 0.0% route)

=========================================================================

B.2.2 Simulation

Page 98: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

82 APPENDIX B. HARDWARE DESCRIPTION

1000

0800

0400 20

0000

00

0000

0000

2000

0000

00

0000

0400

0040

0010

0004

0001

0000

0400

0200

0100

0080

0040

0020

0010

0008

0000

0040

0022

0011

0000

2000

0400

0440

0450

0454

0000

0400

0200

0100

0080

0040

0020

0010

0008

2000

0440

0222

0000

2000

0000

0000

0400

2000

0000

0200

0440

0000

0100

0000

0080

0000

0040

0000

0020

0000

0010

0000

2000

0000

0000

2000

0000

st_s

tart

st_l

oad_

refl

0001

0000

0000

0200

0000

0020

0000

0008

0000

0002

0000

0000

8000

0000

0000

0200

0000

0100

0000

0080

0000

0040

0000

0020

0000

0010

0000

0008

0000

0004

0000

0000

0000

0020

0000

0011

0000

0008

A00

0

0000

1000

0800

0400

0200

0100

0080

0040

0020

1000

0800

0400

0200

0100

0000

2000

0400

0200

0100

0080

0000

1000

0800

0400

0200

0100

0080

0040

0020

1000

0800

0400

0200

0000

2000

0000

0400

0440

0450

XX

XX

2000

010

020

030

040

0

/dua

l_la

t_fs

m_t

b/cl

k

/dua

l_la

t_fs

m_t

b/re

fl0_0

_to_

1210

00

/dua

l_la

t_fs

m_t

b/re

fl1_0

_to_

1208

00

/dua

l_la

t_fs

m_t

b/re

fl2_0

_to_

1204

00

/dua

l_la

t_fs

m_t

b/st

art

/dua

l_la

t_fs

m_t

b/no

rm_n

inv

/dua

l_la

t_fs

m_t

b/fs

m_r

ead_

data

2000

0000

/dua

l_la

t_fs

m_t

b/fs

m_w

rite_

enab

le

/dua

l_la

t_fs

m_t

b/fs

m_w

rite_

data

0000

0000

2000

0000

/dua

l_la

t_fs

m_t

b/fs

m_a

ddre

ss00

/dua

l_la

t_fs

m_t

b/uu

t/n0_

cmb

0000

0400

0040

0010

0004

0001

/dua

l_la

t_fs

m_t

b/uu

t/n1_

cmb

0000

0400

0200

0100

0080

0040

0020

0010

0008

0000

0040

0022

0011

/dua

l_la

t_fs

m_t

b/uu

t/n2_

cmb

0000

2000

0400

0440

0450

0454

/dua

l_la

t_fs

m_t

b/uu

t/n3_

cmb

0000

0400

0200

0100

0080

0040

0020

0010

0008

2000

0440

0222

/dua

l_la

t_fs

m_t

b/uu

t/v0_

norm

_cm

b00

0020

0000

00

/dua

l_la

t_fs

m_t

b/uu

t/v1_

norm

_cm

b00

0004

0020

00

/dua

l_la

t_fs

m_t

b/uu

t/v2_

norm

_cm

b00

0002

0004

40

/dua

l_la

t_fs

m_t

b/uu

t/v3_

norm

_cm

b00

0001

00

/dua

l_la

t_fs

m_t

b/uu

t/v4_

norm

_cm

b00

0000

80

/dua

l_la

t_fs

m_t

b/uu

t/v5_

norm

_cm

b00

0000

40

/dua

l_la

t_fs

m_t

b/uu

t/v6_

norm

_cm

b00

0000

20

/dua

l_la

t_fs

m_t

b/uu

t/v7_

norm

_cm

b00

0000

10

/dua

l_la

t_fs

m_t

b/uu

t/dat

a_in

_sel

_cm

b00

0020

0000

00

/dua

l_la

t_fs

m_t

b/uu

t/dat

a_in

_sel

_sig

0000

2000

0000

/dua

l_la

t_fs

m_t

b/uu

t/cur

rent

_sta

test

_sta

rtst

_loa

d_re

fl

/dua

l_la

t_fs

m_t

b/uu

t/sm

pl_c

nt_c

mb

0001

/dua

l_la

t_fs

m_t

b/uu

t/pro

d1_s

ig00

0000

0002

0000

0000

2000

0000

0800

0000

0200

0000

0080

00

/dua

l_la

t_fs

m_t

b/uu

t/pro

d2_s

ig00

0000

0002

0000

0001

0000

0000

8000

0000

4000

0000

2000

0000

1000

0000

0800

0000

0400

0000

0000

0000

2000

0000

1100

0000

08A

000

/dua

l_la

t_fs

m_t

b/uu

t/op1

a_cm

b00

0010

0008

0004

0002

0001

0000

8000

4000

2010

0008

0004

0002

0001

00

/dua

l_la

t_fs

m_t

b/uu

t/op1

b_cm

b00

0020

0004

0002

0001

0000

80

/dua

l_la

t_fs

m_t

b/uu

t/op2

a_cm

b00

0010

0008

0004

0002

0001

0000

8000

4000

2010

0008

0004

0002

00

/dua

l_la

t_fs

m_t

b/uu

t/op2

b_cm

b00

0020

0000

0004

0004

4004

50

/dua

l_la

t_fs

m_t

b/uu

t/dat

a_ou

tX

XX

X20

00

Ent

ity:d

ual_

lat_

fsm

_tb

Arc

hite

ctur

e:be

nch

Dat

e: W

ed M

ar 0

8 3:

35:5

1 P

M W

. Eur

ope

Sta

ndar

d T

ime

2006

R

ow: 1

Pag

e: 1

Page 99: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

B.2. DUAL LATTICE FILTER 83

1000

0800

0400

2000

1000

0800

3000

1800

0C00

4000

2000

1000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

2000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

2000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

2000

0000

0000

0000

0000

0000

0000

0000

0000

0000

2000

0455

2000

FB

AB

022C

0116

FE

6DF

F70

008A

0045

FF

CD

FF

F1

0022

0011

FF

FC

0001

0000

0000

0008

0000

0007

0006

40 u

s80

us

120

us

/dua

l_la

t_fs

m_t

b/cl

k

/dua

l_la

t_fs

m_t

b/re

set

/dua

l_la

t_fs

m_t

b/re

fl0_0

_to_

1210

00

/dua

l_la

t_fs

m_t

b/re

fl1_0

_to_

1208

00

/dua

l_la

t_fs

m_t

b/re

fl2_0

_to_

1204

00

/dua

l_la

t_fs

m_t

b/re

fl0_1

3_to

_26

2000

/dua

l_la

t_fs

m_t

b/re

fl1_1

3_to

_26

1000

/dua

l_la

t_fs

m_t

b/re

fl2_1

3_to

_26

0800

/dua

l_la

t_fs

m_t

b/re

fl0_2

7_to

_39

3000

/dua

l_la

t_fs

m_t

b/re

fl1_2

7_to

_39

1800

/dua

l_la

t_fs

m_t

b/re

fl2_2

7_to

_39

0C00

/dua

l_la

t_fs

m_t

b/re

fl0_4

0_to

_159

4000

/dua

l_la

t_fs

m_t

b/re

fl1_4

0_to

_159

2000

/dua

l_la

t_fs

m_t

b/re

fl2_4

0_to

_159

1000

/dua

l_la

t_fs

m_t

b/st

art

/dua

l_la

t_fs

m_t

b/no

rm_n

inv

/dua

l_la

t_fs

m_t

b/fs

m_r

ead_

data

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

2000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

2000

0000

/dua

l_la

t_fs

m_t

b/fs

m_w

rite_

enab

le

/dua

l_la

t_fs

m_t

b/fs

m_w

rite_

data

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

0000

/dua

l_la

t_fs

m_t

b/fs

m_a

ddre

ss00

00

/dua

l_la

t_fs

m_t

b/re

ady

/dua

l_la

t_fs

m_t

b/te

st_d

ata(

0)20

0000

00

/dua

l_la

t_fs

m_t

b/te

st_d

ata(

1)00

0000

00

/dua

l_la

t_fs

m_t

b/te

st_d

ata(

2)00

0000

00

/dua

l_la

t_fs

m_t

b/te

st_d

ata(

3)00

0000

00

/dua

l_la

t_fs

m_t

b/te

st_d

ata(

4)00

0000

00

/dua

l_la

t_fs

m_t

b/pr

oces

sed_

data

(0)

2000

0455

2000

FB

AB

/dua

l_la

t_fs

m_t

b/pr

oces

sed_

data

(1)

022C

0116

FE

6DF

F70

/dua

l_la

t_fs

m_t

b/pr

oces

sed_

data

(2)

008A

0045

FF

CD

FF

F1

/dua

l_la

t_fs

m_t

b/pr

oces

sed_

data

(3)

0022

0011

FF

FC

0001

/dua

l_la

t_fs

m_t

b/pr

oces

sed_

data

(4)

0000

0000

0008

0000

0007

0006

/dua

l_la

t_fs

m_t

b/st

op_t

he_c

lock

Ent

ity:d

ual_

lat_

fsm

_tb

Arc

hite

ctur

e:be

nch

Dat

e: W

ed M

ar 0

8 3:

12:5

4 P

M W

. Eur

ope

Sta

ndar

d T

ime

2006

R

ow: 1

Pag

e: 1

Page 100: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

84 APPENDIX B. HARDWARE DESCRIPTION

Page 101: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

Abbreviations CACELP Algebraic Code Exited Linear PredictionADPCM Adaptive Differential Pulse Code ModulationAEC Acoustic Echo CancellationAGC Automatic Gain ControlAMR Adaptive Multi-RateAPCM Adaptive Pulse Code ModulationARP Address Resolution ProtocolCNG Comfort Noise GenerationCS-CELP Conjugate Structure Code Exited Linear PredictionDHCP Dynamic Host Configuration ProtocolDNS Domain Name SystemDTMF Dual Tone Multiplexed FrequencyETSI European Telecommunications Standards InstituteFPGA Field Programmable Gate ArrayGSM Global System for Mobile communicationsICMP Internet Control Message ProtocolIETF Internet Engineering Task ForceIP Internet Protocol/Intellectual PropertyISP Internet Service ProviderITU International Telecommunications UnionLAN Local Area NetworkLD-CELP Low Delay Code Exited Linear PredictionMAC Media Access ControllerMIME Multi-purpose Internet Mail ExtensionsMOS Mean Opinion ScoreMTU Maximum Transmission UnitMP-MLQ Multi Pulse Multi-Level QuantizationNAT Network Address TranslationNTP Network Time ProtocolOPB On-chip Peripheral BusPCM Pulse Code ModulationPGP Pretty Good PrivacyPLB Processor Local BusPLC Packet Loss ConcealmentPOTS Plain Old Telephone ServicePSTN Public Switched Telephone NetworkQoS Quality of ServiceRARP Reverse Address Resolution Protocol

85

Page 102: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

86 APPENDIX C. ABBREVIATIONS

RELP Regular Pulse Excited Linear PredictionRPE-LTP Regular Pulse Excited-Long Term PredictionRTCP Real-Time Control ProtocolRTP Real-time Transport ProtocolRTT Round Trip TimeSIP Session Initiation ProtocolSDP Session Description ProtocolSTUN Simple Traversal of UDP over NATSVGA Super VGA (800×600)TCP Transmission Control ProtocolTURN Traversal Using Relay NATUA User AgentUAC User Agent ClientUAS User Agent ServerUDP User Datagram ProtocolVAD Voice Activity DetectionVGA Video Graphics Array (640×480)VoIP Voice over IPWAN Wide Area NetworkWLAN Wireless LANXUP Xilinx University Program

Page 103: MSc THESISce-publications.et.tudelft.nl/publications/741_voiceover...MSc THESIS Voice-over IP implementation on a Field Programmable Gate Array M. van den Braak Abstract Faculty of

Curriculum Vitae

M. van den Braak is born on the 7th of Au-gust 1981 in Purmerend, The Netherlands. Atthat time he was one of the first to experiencethe joy of a personal computer, the Apple II Plus.This genius machine had initiated the interest inelectronics (he almost was earlier able to programthan to speak). He attended secondary school atthe Da Vinci College in Purmerend. After finish-ing the VWO in 1999, he started Electrical Engi-neering at the TU Delft. In 2005 he earned his de-gree of Bachelor of Science in Electrical Engineer-ing. Meanwhile, he produced the annual book in2001 and he organized a study tour for fellow stu-dents to Moscow and St. Petersburg in 2003 onbehalf of the study guild the ETV (Electrotech-

nische Vereeniging). Besides, he performed voluntary work as a theater technician and astreasurer of a Youth Society. He likes programming in C, Delphi, PHP, Java for leisure,but also does not refrain from soldering electrical circuits. He also appreciates all kindsof music and indulges in playing the guitar and piano.