20io international conference on computer application and system modeling
TRANSCRIPT
-
8/6/2019 20IO International Conference on Computer Application and System Modeling
1/19
An Application of VoIP Communication on Embedded System
Dept of E&C , SJBIT Page 1
Chapter-1
INTRODUCTION
Voice over Internet Protocol (VoIP) is a family of transmission technologies for delivery
of voice communications over packet-switched networks, such as the Internet and other IP
networks [I]. For the reason that VoIP makes good use of the Internet technologies and the
extensive web-linked environment, it is able to offer much more versatile services with lower, or
even no cost. Moreover, combining with the embedded technology, VoIP can allow a wide range
of hand-held devices to have their access to real-time voice communication on the Internet,
which is also the research highlight in the recent few years. Nevertheless, due to the inherentcapacity limitation of the embedded hardware and the real-time requirement on voice
communication, an embedded VoIP system has to take many factors into the design
consideration and carefully weigh among the protocols and compression algorithms for the ones
suit best. The VoIP system in this paper sets the hardware foundation on ARM9 embedded
platform, adopts SIP and RTP as the basis of Internet protocol and employs CELP compression
algorithms to ensure the low-latency and high-quality communication. In addition, UDA1341
sound codec and ALSA sound driver are used to guarantee the performance of the speech
recording and playing.
Internet telephony refers to communications servicesvoice, fx, SMS, and/or voice-messaging
applicationsthat are transported via the Internet, rather than the public switched telephone
network (PSTN). The steps involved in originating a VoIP telephone call are signaling and
media channel setup, digitization of the analog voice signal, encoding, packetization, and
transmission as Internet Protocol (IP) packets over a packet-switched network.
-
8/6/2019 20IO International Conference on Computer Application and System Modeling
2/19
An Application of VoIP Communication on Embedded System
Dept of E&C , SJBIT Page 2
Chapter-2
BASIC PRINCIPLES
2.1 Basic Principles of VoIP
VoIP is a voice communication technology based on digital voice processing
techniques and the Internet applications. The basic steps involved in originating a VoIP call are
analog-to-digital conversion, signal compression/translation and IP package for transmission; the
process is reversed at the receiving end. Normally, a VoIP user takes his prior concerns to the
voice QoS and real-time response, so it is of vital importance that a VoIP system must put higher
priority on choosing the adequate protocols and voice compression methods to meet the user's
top needs, even if the cost may sometimes be a sacrifice of network reliability.
2.1.1 SIP Protocol
Using different protocols or standards, the initialization of a VoIP session can be
implemented in various ways. Typical examples of those implementations include: H.323 and
Session Initiation Protocol (SIP). However, the H.323 standard relies excessively on centralized
network servers to\ launch calls and its message format is too complex for embedded hardware
to translate. Besides, its poor expansion capacity and time-consuming coding process also fail it
to be a suitable alternative for the VoIP system. SIP is an application layer signaling protocol
widely used for creating, modifying and terminating multimedia communication sessions [3].
Likely the HTTP and STMP protocol, SIP uses text elements as the message format and has been
proved to have the merits of simpler structure and faster response. Therefore VoIP systems using
SIP as the session initializer usually have a superior real-time communication in most situations.
Furthermore, SIP is a distributed control protocol. That means SIP is free of central servers for
network management and is easy to deploy. All these positive qualities make the SIP very easy-
to-use and perfectly appropriate for embedded terminal network.
-
8/6/2019 20IO International Conference on Computer Application and System Modeling
3/19
An Application of VoIP Communication on Embedded System
Dept of E&C , SJBIT Page 3
2.1.2 RTP protocolRTP (Real-time Transport Protocol) is an application layer protocol designed to
deliver audio and video over the Internet. As its reliability of real-time service, RTP often works
with SIP as the basic network protocol in a VoIP system [4]. Internally, RTP is often used in
cooperation with the RTCP (Real Time Control Protocol). While RTP transports the audio
packets over the Internet, RTCP is responsible for monitoring transmission statistics and
maintaining QoS.
RTP resides in the application layer and its transport layer protocol basis can be either UDP
(User Datagram Protocol) or TCP (Transmission Control Protocol). The VoIP in this paper
focuses more on the timely transfer of the entire voice stream rather than the precision delivery
of each data packet, because in a length of audio stream, occasional losses of some trivial
fractions are usually unnoticeable, and also repairable [5]. TCP's inherent latency caused by
connection establishment and error correction render it highly inappropriate for the prompt voice
communication. Whereas the UDP protocol, distinguished by low-latency and connectionless oriented service, is more suitable for instant transmission. Thus, all the RTP applications in our
VoIP system adopt UDP, instead of TCP, as the transport layer protocol.
RTP was developed by the Audio/Video Transport working group of the IETF standards
organization. RTP is used in conjunction with other protocols such as H.323 and RTSP. The RTP
standard defines a pair of protocols, RTP and RTCP. RTP is used for transfer of multimedia data,and the RTCP is used to periodically send control information and QoS parameters.
RTP is designed for end-to-end, real-time, transfer of stream data. The protocol provides facility
for jitter compensation and detection of out of sequence arrival in data, that are common during
transmissions on an IP network. RTP supports data transfer to multiple destinations
through multicast. RTP is regarded as the primary standard for audio/video transport in IP
networks and is used with an associated profile and payload format.
-
8/6/2019 20IO International Conference on Computer Application and System Modeling
4/19
-
8/6/2019 20IO International Conference on Computer Application and System Modeling
5/19
An Application of VoIP Communication on Embedded System
Dept of E&C , SJBIT Page 5
X (Extension) : (1 bit) Indicates presence of an Extension header between standard header
and payload data. This is application or profile specific.
CC (CSRC Count) : (4 bits) Contains the number of CSRC identifiers (defined below) that
follow the fixed header.
M (M arker) : (1 bit) Used at the application level and defined by a profile. If it is set, it
means that the current data has some special relevance for the application.
PT (Payload Type) : (7 bits) Indicates the format of the payload and determines its
interpretation by the application. This is specified by an RTP profile. For example, see RTP
Profile for audio and video conferences with minimal control
Sequence Number : (16 bits) The sequence number is incremented by one for each RTP data
packet sent and is to be used by the receiver to detect packet loss and to restore packet
sequence. The RTP does not take any action on packet loss; it is left to the application to take
the desired action.
Timestamp : (32 bits) Used to enable the receiver to play back the received samples at
appropriate intervals. When several media streams are present, the timestamps are
independent in each stream, and may not be relied upon for media synchronization. The
granularity of the timing is application specific. For example, an audio application that
samples data once every 125 s (8 kHz, a common sample rate in digital telephony) could
use that value as its clock resolution. The clock granularity is one of the details that isspecified in the RTP profile for an application.
SSRC : (32 bits) Synchronization source identifier uniquely identifies the source of a stream.
The synchronization sources within the same RTP session will be unique.
CSRC : Contributing source IDs enumerate contributing sources to a stream which has been
generated from multiple sources.
Extension header : (optional) The first 32-bit word contains a profile-specific identifier (16
bits) and a length specifier (16 bits) that indicates the length of the extension in 32 bit units.
-
8/6/2019 20IO International Conference on Computer Application and System Modeling
6/19
An Application of VoIP Communication on Embedded System
Dept of E&C , SJBIT Page 6
2.1.3 Voice Compression Technology in VoIPVoice compression is a process whereby voice data is compacted into less bulk for
better transportation. In a VoIP system, voice compression technology can considerably reduce
the volume of the audio data, and a less bulky data size is surely helpful to relieve the network
load and ensure real-time response in VoIP calls. Among the existing voice processing methods,
Code Excited Linear Prediction (CELP) is generally considered to be the most successful
compression algorithm. CELP speech coding is based on source-filter model, which assumes that
the vocal cord is the source of speech, and the vocal tract serves as a filter to shape various sound
of voice. Since the parameters of the sources and filters of different voices are usually tiny and
the model can identify voices by using merely these parameters, CELP can record and store a
speaker's voice with an unconceivable low bit rate [6]. Normally, CELP is able to control the
transmission rate between 2kbps - 16kbps.
2.2 Architecture of Embedded SystemThe embedded platform contains all the hardware supports needed in the VoIP system,
such as functions of voice sampling, playing, sending and receiving [7]. In view of the cost and
performance of the embedded platform, the system selects Samsung S3C2410 as the central
processor. S3C2410 is designed to provide a cost-saving, power effective and high performance
microprocessor solution for communication application and hand-held devices. It is developed
on ARM 9 core and supports the bus interfaces and peripherals ranging from IIC, lIS,
MMU(Memory Management Unit) to 4 channel DMA, 2 channel USB controller and LCD
controller, fully qualified as the hardware infrastructure of the VoIP application.
The audio codec is of the greatest significance in the entire embedded platform and Philips
UDA1341 is employed to deal with the speech capturing and playing. Shown in Figure 1 is the
wiring diagram of S3C241 0 and UDA 1341. Two chips are connected by lIS and L3 bus, and
the sound codec captures and plays voices under the control of S3C241O.
-
8/6/2019 20IO International Conference on Computer Application and System Modeling
7/19
An Application of VoIP Communication on Embedded System
Dept of E&C , SJBIT Page 7
IIS(lnter-IC Sound) is defined as a serial bus interface for connecting digital audio devices. It is
featured with the distinctive design of separating the clock signals from the data signals. By
doing this the signal jitter and distortion can be substantially reduced during the digital/analog
conversion and the codec is also enabled a high sound definition. Moreover, lIS connects the
FIFO data channel in terms of DMA where data is sent and received synchronously, thereby
UDA1341 is provided with an outstanding speed in voice recording and playback.
L3 is the built-in control bus interface on UDA1341. It joins the UDA1341 and S3C2410 by 3
generic GPIO pins and allows the processor to regulate the codec's signal sequence and operating
mode. Besides, L3 bus is used to control some of the codec's audio features, including volume
adjustments, bass boost and soft mute. In addition to the audio chip, the system development
board also comes with Ethernet card, wifi card, serial port, USB and other peripheral deviceinterface, able to meet the basic communication requirements in the VoIP system.
Figure-1 Wiring diagram of S3C2410 and UDA1341
-
8/6/2019 20IO International Conference on Computer Application and System Modeling
8/19
An Application of VoIP Communication on Embedded System
Dept of E&C , SJBIT Page 8
Chapter-3
MASTER DESIGN OF VoIP SYSTEM
3.1 General Layout
The VoIP system in this paper is architecturally divided into two main parts, the SIP
server and the client software. The server section is responsible for locating the calling and called
parties and establishing the communication environment before sessions actually start. For the
economy and real-time purposes, moderate simplification on server setup is planned and
investigated in our design. The client side's main function is to send and receive compressed
speech streams. To improve the real-time capacity and make better use of the UDP's advantages,the voice streams are designed to make its path directly between client terminals, namely getting
no SIP servers involved on the transmission route. By evading those servers in the way, not only
the latency of voice session is greatly diminished, but servers load is considerably relieved as
well.
3.2 Pattern Layout of SIP Server
3.2.1 Setup o/SIP servers
The SIP server section is consisted of three parts:
3.2.1.1 User Agent:
Though structurally speaking, UA belongs to the client software section, its actual
function is to act as an extension of the SIP server to cope with all SIP requests and responses.
-
8/6/2019 20IO International Conference on Computer Application and System Modeling
9/19
An Application of VoIP Communication on Embedded System
Dept of E&C , SJBIT Page 9
3.2.1.2 Register/Location server:
When serving as a register server, Register/Location server dynamically establishes the
mapping relationship between users' logical and physical addresses. The mapping relationship
can be further used to support call routing devices and subscriber mobility. When receiving a
location request from a user, the server can also find and return the needed user IP according to
the mapping list it contains.
3.2.1.3 Proxy/Redirect Server:
In most circumstances the proxy server awaits the incoming requests from client-sides
and relays those messages to the specified resource server according to the visiting strategy.
After the resource server responds, the required contents will then be sent back to the source
clients. If a user registers in the server moves to a new position or changes its IP address, the
redirection function of SIP will be activated. The server will redirect and return the user's new IP
address by tracing and consulting the relevant servers. In most cases, it is a usual routine that an
extra server should be set up in a VoIP system to take charge of the redirection function. When
redirecting requests occur, the redirect server will be visited by proxy for a new user's new
destination address. Although the addition of a specialized server can facilitate the charging
management and integral control of the VoIP system, it certainly increases the cost of the system
construction and the response time in the meanwhile. In this paper, the proxy and redirection
software are deployed together on one server machine in hope of cutting down the equipment
expenditure and service timeliness. As a matter of fact, the cost can be reduced for sure because
fewer machines are involved, and the real-time capacity is also improved for the reason that most
redirection requests are made from proxy server and local accesses are definitely far timelier than
remote ones.
-
8/6/2019 20IO International Conference on Computer Application and System Modeling
10/19
An Application of VoIP Communication on Embedded System
Dept of E&C , SJBIT Page 10
3.2.2 Messaging System of SIP Servers
Figure 2 shows the messaging system in the SIP servers, the full lines and the dash lines
represent the SIP requests and acknowledgements respectively .
Figure-2 Message Flow in SIP protocol
The initial session request triggered by a VoIP client is firstly delivered from the SIP proxy to
the location server to obtain the IP address of the called party. The location server then sends an
invite message to the target IP (or the proxy of this IP) and returns the calling party a wait-to-
confirm message. If the IP address is valid and the PC client accepts the invite, then the called
party will return an acknowledgement message, after the confirmation of which, two parties in
the session finally finish swapping their IP addresses and the communication environment is
ready for the upcoming voice transmission .
-
8/6/2019 20IO International Conference on Computer Application and System Modeling
11/19
An Application of VoIP Communication on Embedded System
Dept of E&C , SJBIT Page 11
3.3 Pattern Layout of Client Software
3.3.1 Overall Structure of Client-side
On the user-level, the client software offers the communication control interface, while
procedures in background handle all the sound processing and data transmission. The soft
structure of the client side comprises three parts, as shown in Figure 3.
Figure-3 Integral Architecture of Client
-
8/6/2019 20IO International Conference on Computer Application and System Modeling
12/19
An Application of VoIP Communication on Embedded System
Dept of E&C , SJBIT Page 12
3.3.1.1 GUI interface:
GUI interface is the graphical channel between users and the primary control thread. A
VoIP user can, whenever necessary, manipulate and be aware of the status of background threadsthrough this channel.
3.3.1.2 Voice processing:
Threads in voice processing module are in charge of all the VoIP sound operations, including
capturing, playing, compressing and decompressing. The thread can be further divided into two
sub procedures based on their operational order; one is sampling, compressing and sending;while another is receiving, decompressing and playing.
3.3.1.3 Data channel:
The client software provides two communication highways (each has a thread) to transport
all the data generated from the user's voice and operation. The SIP communication thread (or the
client-to-server thread), which is implemented by the osip/eXosip open source library, interacts
with the SIP proxy to help the client build the initial session connection [6]. Viewed from the
functional aspect, the c-to-s thread also corresponds to the user agent in the SIP server section.
The RTP communication thread (client-to-client thread) utilizes the ORTP library as the protocol
base and enables a user to have a direct RTP access to his counterpart. Unlike other ordinary
VoIP solutions, the VoIP system in this paper assigns the RTP communication an independent
thread. Though the implementation complexity may rise correspondingly, the improvementapparently outweighs the trouble. The benefit consists in that the voice packets sent from one
user can bypass SIP servers and make an immediate access to the other client side, thereby the
latency occurs in the course of session can be remarkably reduced.
-
8/6/2019 20IO International Conference on Computer Application and System Modeling
13/19
-
8/6/2019 20IO International Conference on Computer Application and System Modeling
14/19
An Application of VoIP Communication on Embedded System
Dept of E&C , SJBIT Page 14
After the voice processing thread, the compressed voice data will be subsequently sent to the
RTP thread and encapsulated into IP packets for delivery. It is obviously that the relationship
between these two threads matches the production-consumption model, and an effective way to
improve the throughput between them is to set up a shared critical area. In the critical area, two
threads are permitted to coincide simultaneously and the system idle time can be shortened by a
large extent.
3.3.3 Implementation of RTP Transmission
Applications in the RTP module is implemented on the base of ORTP software package,
as shown in Figure 5.
Figure-5. Integral Design of the RTP Module
The primary control module is in the central position of the RTP thread, and it monitors other
sub modules and keep them work in good order. As for the RTP module, its major function is tosend/receive the processed voice to/from the opposite side of the session. It is also responsible
for generating RTP statistics and QoS information for the latter RTCP quality test. By
periodically exchanging and verifying those statistics, RTCP module can detect the occurrence
-
8/6/2019 20IO International Conference on Computer Application and System Modeling
15/19
An Application of VoIP Communication on Embedded System
Dept of E&C , SJBIT Page 15
of any abnormal situation and report them to the primary control module. If necessary, the
control module will adjust RTP rate and packet load to maintain the transmission QoS.
According to the R TP standard, each R TP packet contains two parts, the payload and the
header. The payload is designed to load the voice data, while the header is used to carry the
information needed for the QoS maintenance. Among all these auxiliary information, the most
important one is RTP sequent number (RSN). RSN is a RTP packet's unique variable assigned
by the sender client. Because the RSN is always in ascending sequence, the RTP receiving thread
can easily sort out and reassemble voice packets back into their original order. In the voice
processing thread, RSN is used in association with the Speex to promote the robustness in the
conversation. In this procedure, the RTP receiving thread first scrutinizes all the RSN for packet
loss. If packet loss does occur, the RTP thread will then inform the Speex to fix the lost fractionsand obscure the incomplete voice stream. By introducing this quality assurance mechanism, the
system can have a strong tolerance of data deficiency despite the fact that RTP and UDP never
retry lost packets. Besides, when data loss is too severe for the Speex thread to restore (usually a
loss of more than three consecutive packets), the RTCP module will readjust the packet size and
sending rate for a transmission in poor network environment.
-
8/6/2019 20IO International Conference on Computer Application and System Modeling
16/19
An Application of VoIP Communication on Embedded System
Dept of E&C , SJBIT Page 16
Chapter-4
SYSTEM TEST AND EXPERIMENTAL RESULTS
Generally speaking, session quality and bandwidth occupancy are two of the most
important indicators to evaluate a VoIP system. The former is a subjective indicator so it entails
listeners to make estimations. In this paper, the session quality is measured by mean opinion
score (MOS), an ITU-T P.800 specified numerical indication of the perceived voice quality after
compression and transmission. The MOS is expressed as a single number in the range 1 to 5,
where 1 is lowest perceived audio quality, and 5 is the highest. MOS also demands that no less
than 15 listeners should be involved in a test and the final result is given by the arithmetic mean
of all the individual scores. It is generally believed that a VoIP is able to provide high quality
voice communication if its MOS score is better than 4. In the process of our MOS test, a PC and
an embedded terminal are set up as the session participants. Given a 96Kbps PCM as the voice
input, the PC terminal alters the compression sampling rate from 8 KHz to 32 KHz and
snapshots the network flow as the reference of bandwidth occupancy. The test result is shown in
Tab.4.I.
Table -1 Statistics of the quality in VoIP
Contributed much by the CELP and RTP, the VoIP system performs excellently in voice
compression (38 as the best), bandwidth conservation and system average utilization. It can
provide a high quality voice communication (MOS 4.0) under the minimum bit rate of mere
3.9Kbps. The test also shows that different compression sampling rates have little influence on
-
8/6/2019 20IO International Conference on Computer Application and System Modeling
17/19
An Application of VoIP Communication on Embedded System
Dept of E&C , SJBIT Page 17
speech clarity. Except the perceptible but not annoying noise in 8 KHz, voices under all these
three sampling rates are fluent and perceivable. From the aspects of the overall capacity, the
system has prominence on both bit rate and voice quality, fulfilling the expected goals to design
a real-time voice system on the embedded platform .
-
8/6/2019 20IO International Conference on Computer Application and System Modeling
18/19
An Application of VoIP Communication on Embedded System
Dept of E&C , SJBIT Page 18
Chapter-5
CONCLUSION
In this paper, introduction and implementation of an embedded VoIP system are
elaborated in detail. The VoIP system takes S3C2410+UDA1341 as the hardware base, supports
the Internet protocols of SIP and RTP and employs ALSA sound driver and CELP compression
algorithms to ensure the sound effects. To strengthen the real-time and QoS performance in
communication, improvements on server setup and voice processing are attempted and
investigated. Finally, as shown by the test result, this VoIP proves to be capable of offering the
compression rate of 38 as its best, and providing high-quality voice communication with the bandwidth of only 3.8Kbps.
-
8/6/2019 20IO International Conference on Computer Application and System Modeling
19/19
An Application of VoIP Communication on Embedded System
Dept of E&C , SJBIT Page 19
REFERENCES
[I] Samrat Ganguly, Sudeept Bhatnagar, "VoIP: wireless, P2P and New Enterprise Voice Over
IP," England; Wiley, 2008.
[2] OODE B. "Voice over Internet protocol," Proceedings of the IEE8.2002, 90(9) 1495.
[3] M.Handley, H.Schulzrinne, E.Schooler, etc, "SIP: Session Initiation Protocol,"
IETF(RFC3261 )June,2002.
[4] H.Schulzrinne, S.Casner, R.Frederick,etc., "RTP: A Transport Protocol for Real-Time
Application," IETFJanuary, 1996.
[5] Wei Zheng, "The Research and Design of an Embedded VoIP System Based on SIP
Protocol," Dahan: Dahan University of Technology, 2008.
[6] Javier BustosJ, Alejandro Bassi A, "Voice compression systems for wireless telephony," 21
st International Conference of the Chilean Computer Science Society (SCCC 200 I), Punta
Arenas, Chile
[7] Rui Wang, Shiyuan Yang, 'The design of a rapid prototype platform for ARM based
embedded system," Consumer Electronics IEEE Transactions, 2004, 50(2):746-751.
[8] Gurbani V, Sun Xianhe, "Extensions to an Internet Signaling protocol to support
telecommunication services," IEEE Communications Magazine. 2004, 38 (10):53-59.