20io international conference on computer application and system modeling

8/6/2019 20IO International Conference on Computer Application and System Modeling

1/19

An Application of VoIP Communication on Embedded System

Dept of E&C , SJBIT Page 1

Chapter-1

INTRODUCTION

Voice over Internet Protocol (VoIP) is a family of transmission technologies for delivery

of voice communications over packet-switched networks, such as the Internet and other IP

networks [I]. For the reason that VoIP makes good use of the Internet technologies and the

extensive web-linked environment, it is able to offer much more versatile services with lower, or

even no cost. Moreover, combining with the embedded technology, VoIP can allow a wide range

of hand-held devices to have their access to real-time voice communication on the Internet,

which is also the research highlight in the recent few years. Nevertheless, due to the inherentcapacity limitation of the embedded hardware and the real-time requirement on voice

communication, an embedded VoIP system has to take many factors into the design

consideration and carefully weigh among the protocols and compression algorithms for the ones

suit best. The VoIP system in this paper sets the hardware foundation on ARM9 embedded

platform, adopts SIP and RTP as the basis of Internet protocol and employs CELP compression

algorithms to ensure the low-latency and high-quality communication. In addition, UDA1341

sound codec and ALSA sound driver are used to guarantee the performance of the speech

recording and playing.

Internet telephony refers to communications servicesvoice, fx, SMS, and/or voice-messaging

applicationsthat are transported via the Internet, rather than the public switched telephone

network (PSTN). The steps involved in originating a VoIP telephone call are signaling and

media channel setup, digitization of the analog voice signal, encoding, packetization, and

transmission as Internet Protocol (IP) packets over a packet-switched network.


2/19



Chapter-2

BASIC PRINCIPLES

2.1 Basic Principles of VoIP

VoIP is a voice communication technology based on digital voice processing

techniques and the Internet applications. The basic steps involved in originating a VoIP call are

analog-to-digital conversion, signal compression/translation and IP package for transmission; the

process is reversed at the receiving end. Normally, a VoIP user takes his prior concerns to the

voice QoS and real-time response, so it is of vital importance that a VoIP system must put higher

priority on choosing the adequate protocols and voice compression methods to meet the user's

top needs, even if the cost may sometimes be a sacrifice of network reliability.

2.1.1 SIP Protocol

Using different protocols or standards, the initialization of a VoIP session can be

implemented in various ways. Typical examples of those implementations include: H.323 and

Session Initiation Protocol (SIP). However, the H.323 standard relies excessively on centralized

network servers to\ launch calls and its message format is too complex for embedded hardware

to translate. Besides, its poor expansion capacity and time-consuming coding process also fail it

to be a suitable alternative for the VoIP system. SIP is an application layer signaling protocol

widely used for creating, modifying and terminating multimedia communication sessions [3].

Likely the HTTP and STMP protocol, SIP uses text elements as the message format and has been

proved to have the merits of simpler structure and faster response. Therefore VoIP systems using

SIP as the session initializer usually have a superior real-time communication in most situations.

Furthermore, SIP is a distributed control protocol. That means SIP is free of central servers for

network management and is easy to deploy. All these positive qualities make the SIP very easy-

to-use and perfectly appropriate for embedded terminal network.


3/19



2.1.2 RTP protocolRTP (Real-time Transport Protocol) is an application layer protocol designed to

deliver audio and video over the Internet. As its reliability of real-time service, RTP often works

with SIP as the basic network protocol in a VoIP system [4]. Internally, RTP is often used in

cooperation with the RTCP (Real Time Control Protocol). While RTP transports the audio

packets over the Internet, RTCP is responsible for monitoring transmission statistics and

maintaining QoS.

RTP resides in the application layer and its transport layer protocol basis can be either UDP

(User Datagram Protocol) or TCP (Transmission Control Protocol). The VoIP in this paper

focuses more on the timely transfer of the entire voice stream rather than the precision delivery

of each data packet, because in a length of audio stream, occasional losses of some trivial

fractions are usually unnoticeable, and also repairable [5]. TCP's inherent latency caused by

connection establishment and error correction render it highly inappropriate for the prompt voice

communication. Whereas the UDP protocol, distinguished by low-latency and connectionless oriented service, is more suitable for instant transmission. Thus, all the RTP applications in our

VoIP system adopt UDP, instead of TCP, as the transport layer protocol.

RTP was developed by the Audio/Video Transport working group of the IETF standards

organization. RTP is used in conjunction with other protocols such as H.323 and RTSP. The RTP

standard defines a pair of protocols, RTP and RTCP. RTP is used for transfer of multimedia data,and the RTCP is used to periodically send control information and QoS parameters.

RTP is designed for end-to-end, real-time, transfer of stream data. The protocol provides facility

for jitter compensation and detection of out of sequence arrival in data, that are common during

transmissions on an IP network. RTP supports data transfer to multiple destinations

through multicast. RTP is regarded as the primary standard for audio/video transport in IP

networks and is used with an associated profile and payload format.


4/19


5/19



X (Extension) : (1 bit) Indicates presence of an Extension header between standard header

and payload data. This is application or profile specific.

CC (CSRC Count) : (4 bits) Contains the number of CSRC identifiers (defined below) that

follow the fixed header.

M (M arker) : (1 bit) Used at the application level and defined by a profile. If it is set, it

means that the current data has some special relevance for the application.

PT (Payload Type) : (7 bits) Indicates the format of the payload and determines its

interpretation by the application. This is specified by an RTP profile. For example, see RTP

Profile for audio and video conferences with minimal control

Sequence Number : (16 bits) The sequence number is incremented by one for each RTP data

packet sent and is to be used by the receiver to detect packet loss and to restore packet

sequence. The RTP does not take any action on packet loss; it is left to the application to take

the desired action.

Timestamp : (32 bits) Used to enable the receiver to play back the received samples at

appropriate intervals. When several media streams are present, the timestamps are

independent in each stream, and may not be relied upon for media synchronization. The

granularity of the timing is application specific. For example, an audio application that

samples data once every 125 s (8 kHz, a common sample rate in digital telephony) could

use that value as its clock resolution. The clock granularity is one of the details that isspecified in the RTP profile for an application.

SSRC : (32 bits) Synchronization source identifier uniquely identifies the source of a stream.

The synchronization sources within the same RTP session will be unique.

CSRC : Contributing source IDs enumerate contributing sources to a stream which has been

generated from multiple sources.

Extension header : (optional) The first 32-bit word contains a profile-specific identifier (16

bits) and a length specifier (16 bits) that indicates the length of the extension in 32 bit units.


6/19



2.1.3 Voice Compression Technology in VoIPVoice compression is a process whereby voice data is compacted into less bulk for

better transportation. In a VoIP system, voice compression technology can considerably reduce

the volume of the audio data, and a less bulky data size is surely helpful to relieve the network

load and ensure real-time response in VoIP calls. Among the existing voice processing methods,

Code Excited Linear Prediction (CELP) is generally considered to be the most successful

compression algorithm. CELP speech coding is based on source-filter model, which assumes that

the vocal cord is the source of speech, and the vocal tract serves as a filter to shape various sound

of voice. Since the parameters of the sources and filters of different voices are usually tiny and

the model can identify voices by using merely these parameters, CELP can record and store a

speaker's voice with an unconceivable low bit rate [6]. Normally, CELP is able to control the

transmission rate between 2kbps - 16kbps.

2.2 Architecture of Embedded SystemThe embedded platform contains all the hardware supports needed in the VoIP system,

such as functions of voice sampling, playing, sending and receiving [7]. In view of the cost and

performance of the embedded platform, the system selects Samsung S3C2410 as the central

processor. S3C2410 is designed to provide a cost-saving, power effective and high performance

microprocessor solution for communication application and hand-held devices. It is developed

on ARM 9 core and supports the bus interfaces and peripherals ranging from IIC, lIS,

MMU(Memory Management Unit) to 4 channel DMA, 2 channel USB controller and LCD

controller, fully qualified as the hardware infrastructure of the VoIP application.

The audio codec is of the greatest significance in the entire embedded platform and Philips

UDA1341 is employed to deal with the speech capturing and playing. Shown in Figure 1 is the

wiring diagram of S3C241 0 and UDA 1341. Two chips are connected by lIS and L3 bus, and

the sound codec captures and plays voices under the control of S3C241O.


7/19



IIS(lnter-IC Sound) is defined as a serial bus interface for connecting digital audio devices. It is

featured with the distinctive design of separating the clock signals from the data signals. By

doing this the signal jitter and distortion can be substantially reduced during the digital/analog

conversion and the codec is also enabled a high sound definition. Moreover, lIS connects the

FIFO data channel in terms of DMA where data is sent and received synchronously, thereby

UDA1341 is provided with an outstanding speed in voice recording and playback.

L3 is the built-in control bus interface on UDA1341. It joins the UDA1341 and S3C2410 by 3

generic GPIO pins and allows the processor to regulate the codec's signal sequence and operating

mode. Besides, L3 bus is used to control some of the codec's audio features, including volume

adjustments, bass boost and soft mute. In addition to the audio chip, the system development

board also comes with Ethernet card, wifi card, serial port, USB and other peripheral deviceinterface, able to meet the basic communication requirements in the VoIP system.

Figure-1 Wiring diagram of S3C2410 and UDA1341


8/19



Chapter-3

MASTER DESIGN OF VoIP SYSTEM

3.1 General Layout

The VoIP system in this paper is architecturally divided into two main parts, the SIP

server and the client software. The server section is responsible for locating the calling and called

parties and establishing the communication environment before sessions actually start. For the

economy and real-time purposes, moderate simplification on server setup is planned and

investigated in our design. The client side's main function is to send and receive compressed

speech streams. To improve the real-time capacity and make better use of the UDP's advantages,the voice streams are designed to make its path directly between client terminals, namely getting

no SIP servers involved on the transmission route. By evading those servers in the way, not only

the latency of voice session is greatly diminished, but servers load is considerably relieved as

well.

3.2 Pattern Layout of SIP Server

3.2.1 Setup o/SIP servers

The SIP server section is consisted of three parts:

3.2.1.1 User Agent:

Though structurally speaking, UA belongs to the client software section, its actual

function is to act as an extension of the SIP server to cope with all SIP requests and responses.


9/19



3.2.1.2 Register/Location server:

When serving as a register server, Register/Location server dynamically establishes the

mapping relationship between users' logical and physical addresses. The mapping relationship

can be further used to support call routing devices and subscriber mobility. When receiving a

location request from a user, the server can also find and return the needed user IP according to

the mapping list it contains.

3.2.1.3 Proxy/Redirect Server:

In most circumstances the proxy server awaits the incoming requests from client-sides

and relays those messages to the specified resource server according to the visiting strategy.

After the resource server responds, the required contents will then be sent back to the source

clients. If a user registers in the server moves to a new position or changes its IP address, the

redirection function of SIP will be activated. The server will redirect and return the user's new IP

address by tracing and consulting the relevant servers. In most cases, it is a usual routine that an

extra server should be set up in a VoIP system to take charge of the redirection function. When

redirecting requests occur, the redirect server will be visited by proxy for a new user's new

destination address. Although the addition of a specialized server can facilitate the charging

management and integral control of the VoIP system, it certainly increases the cost of the system

construction and the response time in the meanwhile. In this paper, the proxy and redirection

software are deployed together on one server machine in hope of cutting down the equipment

expenditure and service timeliness. As a matter of fact, the cost can be reduced for sure because

fewer machines are involved, and the real-time capacity is also improved for the reason that most

redirection requests are made from proxy server and local accesses are definitely far timelier than

remote ones.


10/19



3.2.2 Messaging System of SIP Servers

Figure 2 shows the messaging system in the SIP servers, the full lines and the dash lines

represent the SIP requests and acknowledgements respectively .

Figure-2 Message Flow in SIP protocol

The initial session request triggered by a VoIP client is firstly delivered from the SIP proxy to

the location server to obtain the IP address of the called party. The location server then sends an

invite message to the target IP (or the proxy of this IP) and returns the calling party a wait-to-

confirm message. If the IP address is valid and the PC client accepts the invite, then the called

party will return an acknowledgement message, after the confirmation of which, two parties in

the session finally finish swapping their IP addresses and the communication environment is

ready for the upcoming voice transmission .


11/19



3.3 Pattern Layout of Client Software

3.3.1 Overall Structure of Client-side

On the user-level, the client software offers the communication control interface, while

procedures in background handle all the sound processing and data transmission. The soft

structure of the client side comprises three parts, as shown in Figure 3.

Figure-3 Integral Architecture of Client


12/19



3.3.1.1 GUI interface:

GUI interface is the graphical channel between users and the primary control thread. A

VoIP user can, whenever necessary, manipulate and be aware of the status of background threadsthrough this channel.

3.3.1.2 Voice processing:

Threads in voice processing module are in charge of all the VoIP sound operations, including

capturing, playing, compressing and decompressing. The thread can be further divided into two

sub procedures based on their operational order; one is sampling, compressing and sending;while another is receiving, decompressing and playing.

3.3.1.3 Data channel:

The client software provides two communication highways (each has a thread) to transport

all the data generated from the user's voice and operation. The SIP communication thread (or the

client-to-server thread), which is implemented by the osip/eXosip open source library, interacts

with the SIP proxy to help the client build the initial session connection [6]. Viewed from the

functional aspect, the c-to-s thread also corresponds to the user agent in the SIP server section.

The RTP communication thread (client-to-client thread) utilizes the ORTP library as the protocol

base and enables a user to have a direct RTP access to his counterpart. Unlike other ordinary

VoIP solutions, the VoIP system in this paper assigns the RTP communication an independent

thread. Though the implementation complexity may rise correspondingly, the improvementapparently outweighs the trouble. The benefit consists in that the voice packets sent from one

user can bypass SIP servers and make an immediate access to the other client side, thereby the

latency occurs in the course of session can be remarkably reduced.


13/19


14/19



After the voice processing thread, the compressed voice data will be subsequently sent to the

RTP thread and encapsulated into IP packets for delivery. It is obviously that the relationship

between these two threads matches the production-consumption model, and an effective way to

improve the throughput between them is to set up a shared critical area. In the critical area, two

threads are permitted to coincide simultaneously and the system idle time can be shortened by a

large extent.

3.3.3 Implementation of RTP Transmission

Applications in the RTP module is implemented on the base of ORTP software package,

as shown in Figure 5.

Figure-5. Integral Design of the RTP Module

The primary control module is in the central position of the RTP thread, and it monitors other

sub modules and keep them work in good order. As for the RTP module, its major function is tosend/receive the processed voice to/from the opposite side of the session. It is also responsible

for generating RTP statistics and QoS information for the latter RTCP quality test. By

periodically exchanging and verifying those statistics, RTCP module can detect the occurrence


15/19



of any abnormal situation and report them to the primary control module. If necessary, the

control module will adjust RTP rate and packet load to maintain the transmission QoS.

According to the R TP standard, each R TP packet contains two parts, the payload and the

header. The payload is designed to load the voice data, while the header is used to carry the

information needed for the QoS maintenance. Among all these auxiliary information, the most

important one is RTP sequent number (RSN). RSN is a RTP packet's unique variable assigned

by the sender client. Because the RSN is always in ascending sequence, the RTP receiving thread

can easily sort out and reassemble voice packets back into their original order. In the voice

processing thread, RSN is used in association with the Speex to promote the robustness in the

conversation. In this procedure, the RTP receiving thread first scrutinizes all the RSN for packet

loss. If packet loss does occur, the RTP thread will then inform the Speex to fix the lost fractionsand obscure the incomplete voice stream. By introducing this quality assurance mechanism, the

system can have a strong tolerance of data deficiency despite the fact that RTP and UDP never

retry lost packets. Besides, when data loss is too severe for the Speex thread to restore (usually a

loss of more than three consecutive packets), the RTCP module will readjust the packet size and

sending rate for a transmission in poor network environment.


16/19



Chapter-4

SYSTEM TEST AND EXPERIMENTAL RESULTS

Generally speaking, session quality and bandwidth occupancy are two of the most

important indicators to evaluate a VoIP system. The former is a subjective indicator so it entails

listeners to make estimations. In this paper, the session quality is measured by mean opinion

score (MOS), an ITU-T P.800 specified numerical indication of the perceived voice quality after

compression and transmission. The MOS is expressed as a single number in the range 1 to 5,

where 1 is lowest perceived audio quality, and 5 is the highest. MOS also demands that no less

than 15 listeners should be involved in a test and the final result is given by the arithmetic mean

of all the individual scores. It is generally believed that a VoIP is able to provide high quality

voice communication if its MOS score is better than 4. In the process of our MOS test, a PC and

an embedded terminal are set up as the session participants. Given a 96Kbps PCM as the voice

input, the PC terminal alters the compression sampling rate from 8 KHz to 32 KHz and

snapshots the network flow as the reference of bandwidth occupancy. The test result is shown in

Tab.4.I.

Table -1 Statistics of the quality in VoIP

Contributed much by the CELP and RTP, the VoIP system performs excellently in voice

compression (38 as the best), bandwidth conservation and system average utilization. It can

provide a high quality voice communication (MOS 4.0) under the minimum bit rate of mere

3.9Kbps. The test also shows that different compression sampling rates have little influence on


17/19



speech clarity. Except the perceptible but not annoying noise in 8 KHz, voices under all these

three sampling rates are fluent and perceivable. From the aspects of the overall capacity, the

system has prominence on both bit rate and voice quality, fulfilling the expected goals to design

a real-time voice system on the embedded platform .


18/19



Chapter-5

CONCLUSION

In this paper, introduction and implementation of an embedded VoIP system are

elaborated in detail. The VoIP system takes S3C2410+UDA1341 as the hardware base, supports

the Internet protocols of SIP and RTP and employs ALSA sound driver and CELP compression

algorithms to ensure the sound effects. To strengthen the real-time and QoS performance in

communication, improvements on server setup and voice processing are attempted and

investigated. Finally, as shown by the test result, this VoIP proves to be capable of offering the

compression rate of 38 as its best, and providing high-quality voice communication with the bandwidth of only 3.8Kbps.


19/19



REFERENCES

[I] Samrat Ganguly, Sudeept Bhatnagar, "VoIP: wireless, P2P and New Enterprise Voice Over

IP," England; Wiley, 2008.

[2] OODE B. "Voice over Internet protocol," Proceedings of the IEE8.2002, 90(9) 1495.

[3] M.Handley, H.Schulzrinne, E.Schooler, etc, "SIP: Session Initiation Protocol,"

IETF(RFC3261 )June,2002.

[4] H.Schulzrinne, S.Casner, R.Frederick,etc., "RTP: A Transport Protocol for Real-Time

Application," IETFJanuary, 1996.

[5] Wei Zheng, "The Research and Design of an Embedded VoIP System Based on SIP

Protocol," Dahan: Dahan University of Technology, 2008.

[6] Javier BustosJ, Alejandro Bassi A, "Voice compression systems for wireless telephony," 21

st International Conference of the Chilean Computer Science Society (SCCC 200 I), Punta

Arenas, Chile

[7] Rui Wang, Shiyuan Yang, 'The design of a rapid prototype platform for ARM based

embedded system," Consumer Electronics IEEE Transactions, 2004, 50(2):746-751.

[8] Gurbani V, Sun Xianhe, "Extensions to an Internet Signaling protocol to support

telecommunication services," IEEE Communications Magazine. 2004, 38 (10):53-59.

20io international conference on computer application and system modeling

Documents