20io international conference on computer application and system modeling

Upload: anand-kumar

Post on 08-Apr-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/6/2019 20IO International Conference on Computer Application and System Modeling

    1/19

    An Application of VoIP Communication on Embedded System

    Dept of E&C , SJBIT Page 1

    Chapter-1

    INTRODUCTION

    Voice over Internet Protocol (VoIP) is a family of transmission technologies for delivery

    of voice communications over packet-switched networks, such as the Internet and other IP

    networks [I]. For the reason that VoIP makes good use of the Internet technologies and the

    extensive web-linked environment, it is able to offer much more versatile services with lower, or

    even no cost. Moreover, combining with the embedded technology, VoIP can allow a wide range

    of hand-held devices to have their access to real-time voice communication on the Internet,

    which is also the research highlight in the recent few years. Nevertheless, due to the inherentcapacity limitation of the embedded hardware and the real-time requirement on voice

    communication, an embedded VoIP system has to take many factors into the design

    consideration and carefully weigh among the protocols and compression algorithms for the ones

    suit best. The VoIP system in this paper sets the hardware foundation on ARM9 embedded

    platform, adopts SIP and RTP as the basis of Internet protocol and employs CELP compression

    algorithms to ensure the low-latency and high-quality communication. In addition, UDA1341

    sound codec and ALSA sound driver are used to guarantee the performance of the speech

    recording and playing.

    Internet telephony refers to communications servicesvoice, fx, SMS, and/or voice-messaging

    applicationsthat are transported via the Internet, rather than the public switched telephone

    network (PSTN). The steps involved in originating a VoIP telephone call are signaling and

    media channel setup, digitization of the analog voice signal, encoding, packetization, and

    transmission as Internet Protocol (IP) packets over a packet-switched network.

  • 8/6/2019 20IO International Conference on Computer Application and System Modeling

    2/19

    An Application of VoIP Communication on Embedded System

    Dept of E&C , SJBIT Page 2

    Chapter-2

    BASIC PRINCIPLES

    2.1 Basic Principles of VoIP

    VoIP is a voice communication technology based on digital voice processing

    techniques and the Internet applications. The basic steps involved in originating a VoIP call are

    analog-to-digital conversion, signal compression/translation and IP package for transmission; the

    process is reversed at the receiving end. Normally, a VoIP user takes his prior concerns to the

    voice QoS and real-time response, so it is of vital importance that a VoIP system must put higher

    priority on choosing the adequate protocols and voice compression methods to meet the user's

    top needs, even if the cost may sometimes be a sacrifice of network reliability.

    2.1.1 SIP Protocol

    Using different protocols or standards, the initialization of a VoIP session can be

    implemented in various ways. Typical examples of those implementations include: H.323 and

    Session Initiation Protocol (SIP). However, the H.323 standard relies excessively on centralized

    network servers to\ launch calls and its message format is too complex for embedded hardware

    to translate. Besides, its poor expansion capacity and time-consuming coding process also fail it

    to be a suitable alternative for the VoIP system. SIP is an application layer signaling protocol

    widely used for creating, modifying and terminating multimedia communication sessions [3].

    Likely the HTTP and STMP protocol, SIP uses text elements as the message format and has been

    proved to have the merits of simpler structure and faster response. Therefore VoIP systems using

    SIP as the session initializer usually have a superior real-time communication in most situations.

    Furthermore, SIP is a distributed control protocol. That means SIP is free of central servers for

    network management and is easy to deploy. All these positive qualities make the SIP very easy-

    to-use and perfectly appropriate for embedded terminal network.

  • 8/6/2019 20IO International Conference on Computer Application and System Modeling

    3/19

    An Application of VoIP Communication on Embedded System

    Dept of E&C , SJBIT Page 3

    2.1.2 RTP protocolRTP (Real-time Transport Protocol) is an application layer protocol designed to

    deliver audio and video over the Internet. As its reliability of real-time service, RTP often works

    with SIP as the basic network protocol in a VoIP system [4]. Internally, RTP is often used in

    cooperation with the RTCP (Real Time Control Protocol). While RTP transports the audio

    packets over the Internet, RTCP is responsible for monitoring transmission statistics and

    maintaining QoS.

    RTP resides in the application layer and its transport layer protocol basis can be either UDP

    (User Datagram Protocol) or TCP (Transmission Control Protocol). The VoIP in this paper

    focuses more on the timely transfer of the entire voice stream rather than the precision delivery

    of each data packet, because in a length of audio stream, occasional losses of some trivial

    fractions are usually unnoticeable, and also repairable [5]. TCP's inherent latency caused by

    connection establishment and error correction render it highly inappropriate for the prompt voice

    communication. Whereas the UDP protocol, distinguished by low-latency and connectionless oriented service, is more suitable for instant transmission. Thus, all the RTP applications in our

    VoIP system adopt UDP, instead of TCP, as the transport layer protocol.

    RTP was developed by the Audio/Video Transport working group of the IETF standards

    organization. RTP is used in conjunction with other protocols such as H.323 and RTSP. The RTP

    standard defines a pair of protocols, RTP and RTCP. RTP is used for transfer of multimedia data,and the RTCP is used to periodically send control information and QoS parameters.

    RTP is designed for end-to-end, real-time, transfer of stream data. The protocol provides facility

    for jitter compensation and detection of out of sequence arrival in data, that are common during

    transmissions on an IP network. RTP supports data transfer to multiple destinations

    through multicast. RTP is regarded as the primary standard for audio/video transport in IP

    networks and is used with an associated profile and payload format.

  • 8/6/2019 20IO International Conference on Computer Application and System Modeling

    4/19

  • 8/6/2019 20IO International Conference on Computer Application and System Modeling

    5/19

    An Application of VoIP Communication on Embedded System

    Dept of E&C , SJBIT Page 5

    X (Extension) : (1 bit) Indicates presence of an Extension header between standard header

    and payload data. This is application or profile specific.

    CC (CSRC Count) : (4 bits) Contains the number of CSRC identifiers (defined below) that

    follow the fixed header.

    M (M arker) : (1 bit) Used at the application level and defined by a profile. If it is set, it

    means that the current data has some special relevance for the application.

    PT (Payload Type) : (7 bits) Indicates the format of the payload and determines its

    interpretation by the application. This is specified by an RTP profile. For example, see RTP

    Profile for audio and video conferences with minimal control

    Sequence Number : (16 bits) The sequence number is incremented by one for each RTP data

    packet sent and is to be used by the receiver to detect packet loss and to restore packet

    sequence. The RTP does not take any action on packet loss; it is left to the application to take

    the desired action.

    Timestamp : (32 bits) Used to enable the receiver to play back the received samples at

    appropriate intervals. When several media streams are present, the timestamps are

    independent in each stream, and may not be relied upon for media synchronization. The

    granularity of the timing is application specific. For example, an audio application that

    samples data once every 125 s (8 kHz, a common sample rate in digital telephony) could

    use that value as its clock resolution. The clock granularity is one of the details that isspecified in the RTP profile for an application.

    SSRC : (32 bits) Synchronization source identifier uniquely identifies the source of a stream.

    The synchronization sources within the same RTP session will be unique.

    CSRC : Contributing source IDs enumerate contributing sources to a stream which has been

    generated from multiple sources.

    Extension header : (optional) The first 32-bit word contains a profile-specific identifier (16

    bits) and a length specifier (16 bits) that indicates the length of the extension in 32 bit units.

  • 8/6/2019 20IO International Conference on Computer Application and System Modeling

    6/19

    An Application of VoIP Communication on Embedded System

    Dept of E&C , SJBIT Page 6

    2.1.3 Voice Compression Technology in VoIPVoice compression is a process whereby voice data is compacted into less bulk for

    better transportation. In a VoIP system, voice compression technology can considerably reduce

    the volume of the audio data, and a less bulky data size is surely helpful to relieve the network

    load and ensure real-time response in VoIP calls. Among the existing voice processing methods,

    Code Excited Linear Prediction (CELP) is generally considered to be the most successful

    compression algorithm. CELP speech coding is based on source-filter model, which assumes that

    the vocal cord is the source of speech, and the vocal tract serves as a filter to shape various sound

    of voice. Since the parameters of the sources and filters of different voices are usually tiny and

    the model can identify voices by using merely these parameters, CELP can record and store a

    speaker's voice with an unconceivable low bit rate [6]. Normally, CELP is able to control the

    transmission rate between 2kbps - 16kbps.

    2.2 Architecture of Embedded SystemThe embedded platform contains all the hardware supports needed in the VoIP system,

    such as functions of voice sampling, playing, sending and receiving [7]. In view of the cost and

    performance of the embedded platform, the system selects Samsung S3C2410 as the central

    processor. S3C2410 is designed to provide a cost-saving, power effective and high performance

    microprocessor solution for communication application and hand-held devices. It is developed

    on ARM 9 core and supports the bus interfaces and peripherals ranging from IIC, lIS,

    MMU(Memory Management Unit) to 4 channel DMA, 2 channel USB controller and LCD

    controller, fully qualified as the hardware infrastructure of the VoIP application.

    The audio codec is of the greatest significance in the entire embedded platform and Philips

    UDA1341 is employed to deal with the speech capturing and playing. Shown in Figure 1 is the

    wiring diagram of S3C241 0 and UDA 1341. Two chips are connected by lIS and L3 bus, and

    the sound codec captures and plays voices under the control of S3C241O.

  • 8/6/2019 20IO International Conference on Computer Application and System Modeling

    7/19

    An Application of VoIP Communication on Embedded System

    Dept of E&C , SJBIT Page 7

    IIS(lnter-IC Sound) is defined as a serial bus interface for connecting digital audio devices. It is

    featured with the distinctive design of separating the clock signals from the data signals. By

    doing this the signal jitter and distortion can be substantially reduced during the digital/analog

    conversion and the codec is also enabled a high sound definition. Moreover, lIS connects the

    FIFO data channel in terms of DMA where data is sent and received synchronously, thereby

    UDA1341 is provided with an outstanding speed in voice recording and playback.

    L3 is the built-in control bus interface on UDA1341. It joins the UDA1341 and S3C2410 by 3

    generic GPIO pins and allows the processor to regulate the codec's signal sequence and operating

    mode. Besides, L3 bus is used to control some of the codec's audio features, including volume

    adjustments, bass boost and soft mute. In addition to the audio chip, the system development

    board also comes with Ethernet card, wifi card, serial port, USB and other peripheral deviceinterface, able to meet the basic communication requirements in the VoIP system.

    Figure-1 Wiring diagram of S3C2410 and UDA1341

  • 8/6/2019 20IO International Conference on Computer Application and System Modeling

    8/19

    An Application of VoIP Communication on Embedded System

    Dept of E&C , SJBIT Page 8

    Chapter-3

    MASTER DESIGN OF VoIP SYSTEM

    3.1 General Layout

    The VoIP system in this paper is architecturally divided into two main parts, the SIP

    server and the client software. The server section is responsible for locating the calling and called

    parties and establishing the communication environment before sessions actually start. For the

    economy and real-time purposes, moderate simplification on server setup is planned and

    investigated in our design. The client side's main function is to send and receive compressed

    speech streams. To improve the real-time capacity and make better use of the UDP's advantages,the voice streams are designed to make its path directly between client terminals, namely getting

    no SIP servers involved on the transmission route. By evading those servers in the way, not only

    the latency of voice session is greatly diminished, but servers load is considerably relieved as

    well.

    3.2 Pattern Layout of SIP Server

    3.2.1 Setup o/SIP servers

    The SIP server section is consisted of three parts:

    3.2.1.1 User Agent:

    Though structurally speaking, UA belongs to the client software section, its actual

    function is to act as an extension of the SIP server to cope with all SIP requests and responses.

  • 8/6/2019 20IO International Conference on Computer Application and System Modeling

    9/19

    An Application of VoIP Communication on Embedded System

    Dept of E&C , SJBIT Page 9

    3.2.1.2 Register/Location server:

    When serving as a register server, Register/Location server dynamically establishes the

    mapping relationship between users' logical and physical addresses. The mapping relationship

    can be further used to support call routing devices and subscriber mobility. When receiving a

    location request from a user, the server can also find and return the needed user IP according to

    the mapping list it contains.

    3.2.1.3 Proxy/Redirect Server:

    In most circumstances the proxy server awaits the incoming requests from client-sides

    and relays those messages to the specified resource server according to the visiting strategy.

    After the resource server responds, the required contents will then be sent back to the source

    clients. If a user registers in the server moves to a new position or changes its IP address, the

    redirection function of SIP will be activated. The server will redirect and return the user's new IP

    address by tracing and consulting the relevant servers. In most cases, it is a usual routine that an

    extra server should be set up in a VoIP system to take charge of the redirection function. When

    redirecting requests occur, the redirect server will be visited by proxy for a new user's new

    destination address. Although the addition of a specialized server can facilitate the charging

    management and integral control of the VoIP system, it certainly increases the cost of the system

    construction and the response time in the meanwhile. In this paper, the proxy and redirection

    software are deployed together on one server machine in hope of cutting down the equipment

    expenditure and service timeliness. As a matter of fact, the cost can be reduced for sure because

    fewer machines are involved, and the real-time capacity is also improved for the reason that most

    redirection requests are made from proxy server and local accesses are definitely far timelier than

    remote ones.

  • 8/6/2019 20IO International Conference on Computer Application and System Modeling

    10/19

    An Application of VoIP Communication on Embedded System

    Dept of E&C , SJBIT Page 10

    3.2.2 Messaging System of SIP Servers

    Figure 2 shows the messaging system in the SIP servers, the full lines and the dash lines

    represent the SIP requests and acknowledgements respectively .

    Figure-2 Message Flow in SIP protocol

    The initial session request triggered by a VoIP client is firstly delivered from the SIP proxy to

    the location server to obtain the IP address of the called party. The location server then sends an

    invite message to the target IP (or the proxy of this IP) and returns the calling party a wait-to-

    confirm message. If the IP address is valid and the PC client accepts the invite, then the called

    party will return an acknowledgement message, after the confirmation of which, two parties in

    the session finally finish swapping their IP addresses and the communication environment is

    ready for the upcoming voice transmission .

  • 8/6/2019 20IO International Conference on Computer Application and System Modeling

    11/19

    An Application of VoIP Communication on Embedded System

    Dept of E&C , SJBIT Page 11

    3.3 Pattern Layout of Client Software

    3.3.1 Overall Structure of Client-side

    On the user-level, the client software offers the communication control interface, while

    procedures in background handle all the sound processing and data transmission. The soft

    structure of the client side comprises three parts, as shown in Figure 3.

    Figure-3 Integral Architecture of Client

  • 8/6/2019 20IO International Conference on Computer Application and System Modeling

    12/19

    An Application of VoIP Communication on Embedded System

    Dept of E&C , SJBIT Page 12

    3.3.1.1 GUI interface:

    GUI interface is the graphical channel between users and the primary control thread. A

    VoIP user can, whenever necessary, manipulate and be aware of the status of background threadsthrough this channel.

    3.3.1.2 Voice processing:

    Threads in voice processing module are in charge of all the VoIP sound operations, including

    capturing, playing, compressing and decompressing. The thread can be further divided into two

    sub procedures based on their operational order; one is sampling, compressing and sending;while another is receiving, decompressing and playing.

    3.3.1.3 Data channel:

    The client software provides two communication highways (each has a thread) to transport

    all the data generated from the user's voice and operation. The SIP communication thread (or the

    client-to-server thread), which is implemented by the osip/eXosip open source library, interacts

    with the SIP proxy to help the client build the initial session connection [6]. Viewed from the

    functional aspect, the c-to-s thread also corresponds to the user agent in the SIP server section.

    The RTP communication thread (client-to-client thread) utilizes the ORTP library as the protocol

    base and enables a user to have a direct RTP access to his counterpart. Unlike other ordinary

    VoIP solutions, the VoIP system in this paper assigns the RTP communication an independent

    thread. Though the implementation complexity may rise correspondingly, the improvementapparently outweighs the trouble. The benefit consists in that the voice packets sent from one

    user can bypass SIP servers and make an immediate access to the other client side, thereby the

    latency occurs in the course of session can be remarkably reduced.

  • 8/6/2019 20IO International Conference on Computer Application and System Modeling

    13/19

  • 8/6/2019 20IO International Conference on Computer Application and System Modeling

    14/19

    An Application of VoIP Communication on Embedded System

    Dept of E&C , SJBIT Page 14

    After the voice processing thread, the compressed voice data will be subsequently sent to the

    RTP thread and encapsulated into IP packets for delivery. It is obviously that the relationship

    between these two threads matches the production-consumption model, and an effective way to

    improve the throughput between them is to set up a shared critical area. In the critical area, two

    threads are permitted to coincide simultaneously and the system idle time can be shortened by a

    large extent.

    3.3.3 Implementation of RTP Transmission

    Applications in the RTP module is implemented on the base of ORTP software package,

    as shown in Figure 5.

    Figure-5. Integral Design of the RTP Module

    The primary control module is in the central position of the RTP thread, and it monitors other

    sub modules and keep them work in good order. As for the RTP module, its major function is tosend/receive the processed voice to/from the opposite side of the session. It is also responsible

    for generating RTP statistics and QoS information for the latter RTCP quality test. By

    periodically exchanging and verifying those statistics, RTCP module can detect the occurrence

  • 8/6/2019 20IO International Conference on Computer Application and System Modeling

    15/19

    An Application of VoIP Communication on Embedded System

    Dept of E&C , SJBIT Page 15

    of any abnormal situation and report them to the primary control module. If necessary, the

    control module will adjust RTP rate and packet load to maintain the transmission QoS.

    According to the R TP standard, each R TP packet contains two parts, the payload and the

    header. The payload is designed to load the voice data, while the header is used to carry the

    information needed for the QoS maintenance. Among all these auxiliary information, the most

    important one is RTP sequent number (RSN). RSN is a RTP packet's unique variable assigned

    by the sender client. Because the RSN is always in ascending sequence, the RTP receiving thread

    can easily sort out and reassemble voice packets back into their original order. In the voice

    processing thread, RSN is used in association with the Speex to promote the robustness in the

    conversation. In this procedure, the RTP receiving thread first scrutinizes all the RSN for packet

    loss. If packet loss does occur, the RTP thread will then inform the Speex to fix the lost fractionsand obscure the incomplete voice stream. By introducing this quality assurance mechanism, the

    system can have a strong tolerance of data deficiency despite the fact that RTP and UDP never

    retry lost packets. Besides, when data loss is too severe for the Speex thread to restore (usually a

    loss of more than three consecutive packets), the RTCP module will readjust the packet size and

    sending rate for a transmission in poor network environment.

  • 8/6/2019 20IO International Conference on Computer Application and System Modeling

    16/19

    An Application of VoIP Communication on Embedded System

    Dept of E&C , SJBIT Page 16

    Chapter-4

    SYSTEM TEST AND EXPERIMENTAL RESULTS

    Generally speaking, session quality and bandwidth occupancy are two of the most

    important indicators to evaluate a VoIP system. The former is a subjective indicator so it entails

    listeners to make estimations. In this paper, the session quality is measured by mean opinion

    score (MOS), an ITU-T P.800 specified numerical indication of the perceived voice quality after

    compression and transmission. The MOS is expressed as a single number in the range 1 to 5,

    where 1 is lowest perceived audio quality, and 5 is the highest. MOS also demands that no less

    than 15 listeners should be involved in a test and the final result is given by the arithmetic mean

    of all the individual scores. It is generally believed that a VoIP is able to provide high quality

    voice communication if its MOS score is better than 4. In the process of our MOS test, a PC and

    an embedded terminal are set up as the session participants. Given a 96Kbps PCM as the voice

    input, the PC terminal alters the compression sampling rate from 8 KHz to 32 KHz and

    snapshots the network flow as the reference of bandwidth occupancy. The test result is shown in

    Tab.4.I.

    Table -1 Statistics of the quality in VoIP

    Contributed much by the CELP and RTP, the VoIP system performs excellently in voice

    compression (38 as the best), bandwidth conservation and system average utilization. It can

    provide a high quality voice communication (MOS 4.0) under the minimum bit rate of mere

    3.9Kbps. The test also shows that different compression sampling rates have little influence on

  • 8/6/2019 20IO International Conference on Computer Application and System Modeling

    17/19

    An Application of VoIP Communication on Embedded System

    Dept of E&C , SJBIT Page 17

    speech clarity. Except the perceptible but not annoying noise in 8 KHz, voices under all these

    three sampling rates are fluent and perceivable. From the aspects of the overall capacity, the

    system has prominence on both bit rate and voice quality, fulfilling the expected goals to design

    a real-time voice system on the embedded platform .

  • 8/6/2019 20IO International Conference on Computer Application and System Modeling

    18/19

    An Application of VoIP Communication on Embedded System

    Dept of E&C , SJBIT Page 18

    Chapter-5

    CONCLUSION

    In this paper, introduction and implementation of an embedded VoIP system are

    elaborated in detail. The VoIP system takes S3C2410+UDA1341 as the hardware base, supports

    the Internet protocols of SIP and RTP and employs ALSA sound driver and CELP compression

    algorithms to ensure the sound effects. To strengthen the real-time and QoS performance in

    communication, improvements on server setup and voice processing are attempted and

    investigated. Finally, as shown by the test result, this VoIP proves to be capable of offering the

    compression rate of 38 as its best, and providing high-quality voice communication with the bandwidth of only 3.8Kbps.

  • 8/6/2019 20IO International Conference on Computer Application and System Modeling

    19/19

    An Application of VoIP Communication on Embedded System

    Dept of E&C , SJBIT Page 19

    REFERENCES

    [I] Samrat Ganguly, Sudeept Bhatnagar, "VoIP: wireless, P2P and New Enterprise Voice Over

    IP," England; Wiley, 2008.

    [2] OODE B. "Voice over Internet protocol," Proceedings of the IEE8.2002, 90(9) 1495.

    [3] M.Handley, H.Schulzrinne, E.Schooler, etc, "SIP: Session Initiation Protocol,"

    IETF(RFC3261 )June,2002.

    [4] H.Schulzrinne, S.Casner, R.Frederick,etc., "RTP: A Transport Protocol for Real-Time

    Application," IETFJanuary, 1996.

    [5] Wei Zheng, "The Research and Design of an Embedded VoIP System Based on SIP

    Protocol," Dahan: Dahan University of Technology, 2008.

    [6] Javier BustosJ, Alejandro Bassi A, "Voice compression systems for wireless telephony," 21

    st International Conference of the Chilean Computer Science Society (SCCC 200 I), Punta

    Arenas, Chile

    [7] Rui Wang, Shiyuan Yang, 'The design of a rapid prototype platform for ARM based

    embedded system," Consumer Electronics IEEE Transactions, 2004, 50(2):746-751.

    [8] Gurbani V, Sun Xianhe, "Extensions to an Internet Signaling protocol to support

    telecommunication services," IEEE Communications Magazine. 2004, 38 (10):53-59.