improving perceived speech quality for wireless voip by cross

Improving Perceived Speech Quality for Wireless

VoIP By Cross-Layer Designs

By Zhuoqun Li

This dissertation is submitted to the University of Plymouth

in partial fulfilment of the award of

Master of Research in Network System Engineering

Supervisor

Prof. Emmanuel C. Ifeachor

School of Computing, Communication and Electronics

University of Plymouth

September 2003

http://www.plymouth.ac.uk/

http://www.plymouth.ac.uk/

ABSTACT

Providing VoIP services with satisfying speech quality in wireless/mobile

Internet is difficult because of impairment factors introduced in the wireless channel,

such as packet error, delay and jitter. Effective packet error recovery mechanisms

such as Automatic Repeat on reQuest (ARQ) in wireless networks are important as

they can reduce packet loss due to bit errors. This dissertation is focus on making use

of cross-layer techniques to improve the performance of ARQ hence to improve the

perceived speech quality for Wireless VoIP, which may be difficult for the layered

protocol structure. The research works for this project have been carried out in two

steps:

First, we use an objective measure of perceived conversational speech quality

(MOSc) as a metric to evaluate the performance of three current retransmission

schemes (i.e. No Retransmission, Speech Property-Based Retransmission and Full

Retransmission). Our findings indicate that the performance of the retransmission

mechanisms is a function of both wireless link quality and delay introduced in the

wireline network. We also propose a perceived speech quality driven retransmission

mechanism, which can automatically switch to the most suitable retransmission

schemes according to QoS parameters reported from different layers.

Next, we investigate the problems introduced by retransmission procedures of the

Stop and Wait ARQ protocol in a Wireless VoIP system. We then propose a cross-

layer framework in which 1) the retransmission procedure of the link layer ARQ

protocol is constrained by the available playout delay 2) In the playout delay

estimation, delivery delay in the wireless channel and wireline network is estimated

separately, and the delivery delay in the wireless channel is constrained to avoid delay

accumulations in the transmitting queue.3) If the retransmission procedure is

terminated prematurely, received noisy copies of a speech packet are combined

together to reduce the damaged part and finally played out at the application layer.

Simulation results show that these cross-layer designs improved the performance

of the Stop and Wait ARQ protocol hence significantly enhanced the perceptual

speech quality of a wireless VoIP system.

I

TABLE OF CONTENTS

ABSTACT .................................................................................................................I

TABLE OF CONTENTS....................................................................................... II

LIST OF FIGURES............................................................................................... IV

LIST OF TABLES ................................................................................................. IV

ACKOWLEDGEMENTS........................................................................................V

CHAPTER 1 .............................................................................................................1

INTRODUCTION...................................................................................................1

1.1 VoIP and Its Application in Wireless Internet.......................................................1

1.2 Motivation ............................................................................................................4

1.2.1 Impairment factors of wireless VoIP speech quality......................................4

1.2.2 Packet error concealment techniques.............................................................6

1.2.3 Cross-layer designs ........................................................................................8

1.2.4 Problem statement..........................................................................................9

1.3 Aims and Objectives...........................................................................................10

1.4 Thesis Contributions...........................................................................................10

1.5 Organization of the Thesis.................................................................................. 11

CHAPTER 2 ........................................................................................................... 12

BACKGROUND THEORIES ............................................................................... 12

2.1 Speech Quality Evaluations................................................................................12

2.1.1 Objective Speech Quality Measurement......................................................12

2.1.2 PESQ............................................................................................................13

2.1.3 E-Model .......................................................................................................14

2.1.4 Conversational speech quality evaluation....................................................15

2.2 Adaptive Playout Buffer .....................................................................................16

2.3 Automatic Repeat upon reQuest (ARQ).............................................................18

CHAPTER 3

PERCEIVED SPEECH QUALITY DRIVEN

RTRANSMISSION METCHANISM .........................20

3.1 Introduction ........................................................................................................20

3.2 Related Works.....................................................................................................21

3.2.1 Speech property-based retransmission mechanisms....................................21

3.2.2 Measuring conversational speech quality ....................................................22

II

3.2.3 Adaptive jitter buffer and retransmission jitters...........................................23

3.3 Simulation System Description ..........................................................................23

3.4 Performance Comparison of Current Retransmission Schemes.........................26

3.5 Perceived Speech Quality Driven Retransmission Scheme ...............................28

3.6 Summary ............................................................................................................29

CHPAPTER 4

PLAYOUT DELAY CONSTRAINED ARQ

and ARQ AWARE PLAYOUT BUFFER .................... 31

4.1 Introduction ........................................................................................................31

4.2 The Cross-Layer Design.....................................................................................33

4.2.1 System model...............................................................................................34

4.2.2 Playout delay constrained ARQ...................................................................34

4.2.3 ARQ aware playout buffer ...........................................................................35

4.2.3.1 Queue model..........................................................................................35

4.2.3.2 ARQ aware playout buffer.....................................................................36

4.3 Simulation Model and Experimental Results .....................................................37

4.3.1 Wireless channel model ...............................................................................37

4.3.2 Voice traffic model .......................................................................................38

4.3.3 Speech quality evaluation ............................................................................38

4.3.4 Simulation results and analysis....................................................................39

4.4 Summary ............................................................................................................41

CHAPTER 5

DISCUSSIONS, SUGGESTIONS for

FURTHER WORKS, and CONCLUSIONS...............43

5.1 Discussions .........................................................................................................43

5.2 Suggestions for Further Works ...........................................................................45

5.3 Conclusions ........................................................................................................47

REFERENCES ......................................................................................................49

APPENDICES........................................................................................................53

[APPENDIX A] ns-2 Extensions for ARQ Retry Limit Control ...........................53

[APPENDIX B] ns-2 Simulation Script for Per Packet Control of ARQ..............56

[APPENDIX C] C code for Majority-Logic Packet Combining ...........................60

[APPENDIX D] List of Items Included in the Appended CD ...............................63

[APPENDIX E] Published Papers .........................................................................64

III

LIST OF FIGURES

Figure 1-1 VoIP Protocol Architecture……………………………………………..... 2

Figure 1-2 the Wireless VoIP system overview……………………………………… 3

Figure 1-3 the Basic model of cross-layer designs………………………………….. 8

Figure 2-1 Basic Structure of Perceptual Evaluation of Speech Quality…………... 13

Figure 2-2 Schematic diagram for MOSc measurement …………………………... 15

Figure 2-3 Timing associated with packet i………………………………………... 16

Figure 3-1 Simulation Environment………………………………………………. 24

Figure 3-2 Overall packet loss rate comparison…………………………………… 27

Figure 3-3 Buffered Retx delay comparison……………………………………….. 27

Figure 3-4 MOSc comparison with 175ms network delay………………………… 27

Figure 3-5 MOSc comparison with packet error probability 0.001………………... 27

Figure 3-6 Perceived speech quality driven Retx scheme pseudo code…………… 29

Figure 4-1 Stop and Wait ARQ……………………………………………………. 31

Figure 4-2 the Cross-layer design system model………………………………….. 33

Figure 4-3 Block diagram of the playout delay constraint ARQ

with packet combining…………... 34

Figure 4-4 Timing associated with Packet…………………………………………. 36

Figure 4-5 the Simulation Model………………………………………………….. 37

Figure 4-6 Overall packet losses comparison……………………………………… 39

Figure 4-7 End-to-end delays with different inter-arrival delay…………………… 39

Figure 4-8 End-to-end delay comparison…….……………………………………. 39

Figure 4-9 Conversational MOS comparison……………………………………… 39

Figure 5-1 Perceived speech quality driven packet error recovery scheduler……... 46

LIST OF TABLES

Table 2-1 MOS scale……………………………………………………………….13

Table.3-1- Average voiced packets losses with fast-exp playout buffer……………25

IV

ACKOWLEDGEMENTS

I would like to express my sincere and deep gratitude to my supervisor, Professor

Emmanuel C. Ifeacher, who provided me the opportunity to commence the study of

Master of Research. His continuous advice and encouragements through this study are

acknowledged and greatly appreciated.

I also had the opportunity to work with researchers in the Centre for Signal

Processing and Multimedia Communications I would like to thank them for their

friendliness and support. Special thanks go to Ms. Lingfen Sun and Mr. ZiZhi Qiao,

for their valuable comments and suggestions. Without their support, this thesis would

not have been possible.

I would like to acknowledge all my classmates in MRes/Msc NSE and CE&SP,

for their generous help and enlightening. With them, I really enjoyed the passed year

in University of Plymouth.

On the personal side, I would like to thank my parents, for their unending love

and support.

V

Improving Perceived Speech Quality for Wireless VoIP by Cross-layer designs

CHAPTER 1

INTRODUCTION

1.1 VoIP and Its Application in Wireless Internet

Packet switched networks such as Internet had been developing very fast in the

past decades. The advantages of packet switched networks, such as efficiency and

flexibility, make them eventually become the terminator of traditional circuit switch

networks, i.e. Public Switch Telephone Network (PSTN). VoIP (Voice over Internet

Protocol or Voice over Packet) is one of the successful stories about applications of

packet networks. Generally, VoIP service is the real time delivery of packetized voice

traffic across packet switched networks such as Internet. It provides economical

communication expense and suitable speech quality compared with traditional

telephone networks.

Recently, wireless/mobile communication has been growing rapidly and

providing more and more convenient services. It’s not a surprise that there’s a great

demand to add voice service to wireless IP networks and wireless handsets. Wireless

VoIP services can be provided in Wireless Local Area Network (WLAN) i.e. IEEE

802.11 [1] network or third generation mobile network (3G) i.e. WCDMA [2]. The

protocol stack of transmitting VoIP traffic in wireline and wireless network is

presented in Figure 1-1.

MRes Thesis –University of Plymouth 1


In application layer, VoIP is supported by RTP (Real-time Transport Protocol) [3].

RTP provides a way to delivery delay-sensitive real-time data. The services provided

by RTP include payload type identification; sequence numbering; timestamping and

delivery monitoring. RTP Applications typically running on top of UDP, which does

not guarantee Quality of Service (QoS) but requiring lower overhead [4].

RTCP (Real-time Control Protocol) is the control protocol associated with RTP.

RTCP monitors the quality of service and conveys information about the participants

in an on-going session [3]. After voice sample is compressed and digitised, then it is

packed as the payload of an IP packet, along with an IP address for the purposes of

routing in IP networks. In the link layer, IP packets with speech data are encapsulated

in frames and supported by IEEE 802.3 [4] or 802.11 for wireline network and

wireless network respectively. Both of these link layer protocols provide services such

as framing, error control, flow control.

RTP RTCPApplication Layer

UDPTransport Layer

IPNetwork Layer

IEEE 802.3 IEEE 802.11x Data Link Layer

Figure 1-1 VoIP Protocol Architecture



Speech Source

Figure 1-2 described a VoIP system implemented in the wireless Internet. Speech

is an analog signal that varies slowly in time (with bandwidth not exceeding 4KHz).

As depicted in Figure 1-2, the speech source alternates between talking and silence

periods, which are typically considered to be exponentially distributed. Before

transmitted over packet switched networks, the speech analog signal has to be

digitised at the sender; the reverse process is performed at the receiver. The

digitalization process is composed of sampling, quantization and encoding. There are

many encoding techniques that have been developed and standardized by the ITU.

The basic encoder is the ITU G.711 which samples the voice signal in 8 kHz and

generates 8-bits per sample. Code Excited Linear Prediction (CELP) based encoders

provide rate reduction (i.e. 8 Kbps for G.729, 5.3 and 6.4 Kbps for G.723.1) at the

expense of lower quality and additional complexity and encoding delay [5]. For the

wireless/mobile communication, codecs with variable rate have been developed, e.g.

AMR [6], EVRC [7].

The encoded speech is then packetized into packets of equal size. Each such

packet includes the headers at the various protocol layers (e.g. RTP 12 bytes, UDP 8

bytes, IP 20 bytes and 802.11 34 bytes) and the payload comprising the encoded

speech for a certain duration depends on the codec deployed (e.g. 20ms for an AMR

12.2k frame).

In the study, Wireless VoIP system is considered in a last-hop scenario. In this

case, voice streams have to traverse wireline networks before they reach the access

point, which is the conjunction point of a wireline network and the wireless channel.

Silence Talk

Internet

Encoder Packetizer DDepacketizer ecoderPlayout Buffer

Figure 1-2 the Wireless VoIP system overview Sender Receiver

Access Point



As the voice packets are sent over IP networks and wireless channel, they incur

vari

us stream of packets with fixed intervals to

the

1.2 Motivation

1.2.1 Impairment factors of wireless VoIP speech quality

ctive according as perceived

by t

able delay and possibly loss. In order to provide a smooth playout delay, at the

receiver, a playout buffer is used to compensate the delay variations. Packets are held

for a later playout time in order to ensure that there are enough packets buffered to be

played out continuously. Any packet arriving after its scheduled playout time is

discarded. There are two types of playout algorithms: fixed and adaptive. A fixed

playout scheme schedules the playout of packets so that the end-to-end delay

(including both network and buffering) is the same for all packets. Fixed jitter buffers

cannot adapt readily to changes in network delays and as a result are not practical in

real VoIP applications. Adaptive playout scheme is more common in VoIP systems.

Adaptive playout buffer can adjust playout delay for each talkspurt hence it is more

suitable for the time-varying IP networks. The scheduled playout delay is a tradeoff of

buffer losses and end-to-end delay. It is important to select the value so as to

maximize the quality of voice communications. A large playout delay decreases

packet loss due to late arrivals but hinders interactivity between the communicating

parties, while small playout delay improves interactivity but causes higher buffer

losses and degrades the speech quality.

The playout buffer deliver continuo

depacketiser, whose responsibility is to stretch speech data from the payload and

feed them to the decoder. The main function of the decoder is to reconstruct speech

signals. Some decoders may implement packet loss concealment (PLC) methods that

produce replacement for the lost data packets. Having been depacketized and decoded,

speech signals are finally played out by the VoIP end devices.

Perceived speech quality of VoIP is defined in subje

he end users. Despite its costs saving benefits, providing acceptable perceived

speech quality is the key for the success of VoIP service. Currently, IP Telephony still

can’t provide a very satisfied quality due to lots of impairments factors introduced in

the transmission path over IP networks. When VoIP is applied in wireless/mobile IP

networks, because of the unreliability of wireless channel performance and the



uncertainty of the mobility of wireless handsets, the speech quality will be more

aggravated. There exist many correlated impairment factors that may seriously affect

the perceived speech quality of Wireless VoIP. In this study, the main impairment

factors are concluded as packet losses, bit errors, end-to-end delays, jitters and coding.

Packet Loss

a major impairment factor. It causes more noticeable degradation in

voic

Bit Error

is not really a problem for VoIP in wireline networks, as it does not

hap

nd-to-end delay

ectly cause any reduction in speech information but affects the

inte

Packet loss is

e quality than any other impairment factors. During their trips in the inter-

connected IP networks, speech packets may be lost due to router overflow or network

link congestion. On the other hand, VoIP applications are supported by the

connectionless protocol - UDP, which means speech packets may travel over different

paths in the IP networks before they arrive at the destination. This result in some

speech packets being out of sequence and are discarded at the receiver. Lost packets

may be reconstructed by the decoder from related information. But it is impossible to

completely rescue speech information carried by the lost packets.

Bit error

pen very often. However, if wireless channels are included in the traverse path of

speech packets, bit errors become a challenging nutshell. In the wireless environment,

the digital signal wave is exposed to absorption, scattering, interference and multi-

path fading. All these effects contribute to the Signal to Noise Ratio (SNR) at the

receiver and hence determine the performance of Bit Error Rate (BER). For packet

communications, the result of bit errors is packet loss if the whole packet is covered

by a checksum. However, if a partial checksum is used specifically for VoIP

applications, speech packets contain bit errors in the payload are still decoded and

played out. In this case, the effect of bit error on the perceived speech quality is

determined by the positions and number of bit errors.

E

Delay does not dir

ractive nature of conversations. The end-to-end delay encompasses: a. the delay

incurred in encoding and decoding; b. the delay incurred in packetization; c. the delay

incurred in the path from the sender to the receiver (e.g. transmission time over IP



networks, queuing delays in network elements, propagation and retransmission time

in wireless channel); d. the delay incurred in the playout buffer. For natural hearing,

delays lower than 100ms cannot really be noticed by most users, between 100ms and

300ms delay begin to affect conversation interactivity [9]. Longer delays are obvious

to the user and make conversations becomes impossible.

itter

s defined as a variation in the delay of received packets. At the sending side,

pack

oding

ocess of transforming analog speech signal to digital bit streams, some

cod

.2.2 Packet error concealment techniques

ror has been a critical impairment factor to

the

J

Jitter i

ets are sent in a continuous stream with the packets being spaced evenly

apart. Due to network congestion, improper queuing, or configuration errors, the

interval between adjacent packets changes constantly, hence the delay between each

packet can vary instead of remaining constant. Jitters can make voice very annoying

to the audience. Removing jitter requires collecting packets and holding them long

enough to allow the slowest packets to arrive in time to be played in the correct

sequence and re-sequence if necessary. This job is normally performed by playout

buffer, which maintains constant packet intervals at the expense of additional playout

delay or packet losses due to not arriving in time.

C

In the pr

ecs also use compression techniques to remove redundant or less important speech

information, as a way to reduce transmission bandwidth requirement while preserving

perceptual important voice signals. This procedure leads to a certain amount of speech

information lost hence affects the speech quality perceived by the user at the receiving

side. For Wireless VoIP, speech quality can be also affected the error-correction

mechanism used by codecs.

1

Packet error due to packet loss or bit er

perceived speech quality of Wireless VoIP. Many packet error concealment

techniques have been developed and improved with great effort. But these techniques

are far from perfect and even can not work properly in new communication

environment such as the growing wireless/mobile internet. Some of the main packet



error recovery methods are described hereafter:

orward Error Correction

(FEC) [11] enables lost data to be recovered at the

recei

Interleaving

as been widely used in mobile networks to distributed burst frame

erro

DP Lite

ite [15] is designed for the applications that prefer to have damaged

data de

F

Forward Error Correction

ver without further reference to the sender. Both the original data and the

redundant information are transmitted to the receiver. There are two kinds of

redundant information: those that are either independent or dependent on the media

stream. The media-independent FEC does not need to know the original data type. In

media-independent FEC, original data together with some redundant data are

transmitted to the receiver. In media dependent or specific FEC, if an original data

packet is lost, redundant data packets, which are related to the specific media, are used

to recover the loss. Usually, the redundant packet is produced using a lower-

bandwidth encoding method than the primary encoding, which results in lower quality

than the original one. The expenses of using FEC are reduced bandwidth efficiency

and increasing end-to-end delay, for the redundant information is transmitted behind

the packet it protects.

Interleaving h

rs in several channels. In VoIP applications, if the size of a data unit produced at a

time by a coder is smaller than the allowed payload size in a packet, then a few data

units may be combined into a single packet. However, in order to reduce the packet-

loss effects, or burst bit error effects in wireless environment, the original data units

are not combined in the same sequential order as produced by the coder, instead they

are interleaved by the transmitter. The resulting small gap intervals correspond

typically to speech intervals considerably shorter than a phoneme length. Therefore,

humans are able to mentally interpolate the gap intervals, and speech intelligibility is

not decreased.

U

UDP L

livered rather than discarded by the network. For VoIP over wireless, it’s not

necessary to discard speech frames that contain only several bit errors. In IP layer, the

IP header has no checksum to cover the IP payload. However UDP checksum covers



Thesis –University of Plymouth 8

Automatic Retransmission reQuest

est (ARQ) [16], when receiver can’t correctly

recei

.2.3 Cross-layer designs

been successfully supported by the layered protocol

arch

the entire datagram including media payload. In fact, in real network applications, it’s

the application layer, not the transport layer, knows best what should be verified by

the checksum. UDP Lite provides a checksum with optionally partial coverage.

In Automatic Retransmission reQu

ve a packet, sender will retransmit it for several times. ARQ-based schemes

mainly consist of three parts: a. lost data detection by the receiver or by the sender

(timeout); b. acknowledgment strategy: The receiver sends acknowledgments that

indicate which data are received or which data are missing; c. retransmission strategy:

It determines which data are retransmitted by the sender. Although it is robust and

efficient against the burst losses, ARQ also bring a series of problems to real-time

applications with delay constraint.

1

IP networks have

itecture since their early development stage. However, for the real-time

applications such as Wireless VoIP, the layered architecture may prevent them to be

readily adaptive for the instantaneous change of communication environment and

consequently can seriously impact their performance. Examples of system

performance degradation due to lack of co-operations among different layers have

been given in [18]. Corresponding solutions for the problems introduced by the

Figure 1-3 designs

Qos inforamtion mapping and

s Joint-Layer QoS technique

the Basic model of cross-layer

MRes


layered protocol architecture have been developed and named as cross-layer approach

or cross-layer design. The objective of cross-layer designs is to achieve efficient QoS

support and network resource allocating by joint-layer techniques, such as QoS

knowledge sharing and QoS mechanisms cooperation among different layers (see

Figure 1-3). The system performance of future networks may be enhanced by such

cross-layer designs between PHY, MAC and higher layer protocols.

Cross-layer designs have been addressed in many recent literatures.

Krishnamachari et al [19] proposed a cross-layer framework to enhance the

performance of video streaming. This framework can adaptively optimize link layer

ARQ, application layer FEC and packetization according to wireless channel

conditions. In [20], a cross-layer design was developed to control transmissions of

video streams over wireless based on the information of prefetched video (application

layer), signal strength and multiple access interference (physical layer).

1.2.4 Problem statement

In this dissertation, we raise the following research questions regarding the

improvement of perceived speech quality for Wireless VoIP by cross-layer approach.

What are the impairment factors of Wireless VoIP applications?

What are the pros and cons of ARQ mechanisms? Is the performance of Wireless

VoIP System improved by ARQ mechanisms in terms of perceived speech quality?

How to optimize current ARQ schemes to improve speech quality? And how to

mapping real-time network and wireless channel QoS parameters into ARQ

protocol optimization?

What are the effects of the interactions between ARQ mechanisms with other

components of the Wireless VoIP system? How to cope with these effects if they

are negative?

How to make use other packet error concealment technologies with ARQ? Or

how to use ARQ as a complement mechanism for other packet error concealment



technologies?

How to establish a cross-layer framework in which we can optimize the QoS

techniques located in different layer with a joint-layer analysis? And how to

establish a profile of real-time predicted speech quality and QoS parameters

collected from different layers and eventually make this profile become the

scheduler of a cross-layer framework?

Bearing these questions in mind, we have reviewed lots of related literatures and

carried out research works toward their corresponding solutions.

1.3 Aims and Objectives

The aim of this project is to develop and evaluate a cross-layer framework to

improve perceived speech quality for Wireless VoIP systems. This framework is

expected to utilize QoS parameters from multiple layers and optimize QoS techniques

located in different layers based on a joint-layer analysis, consequently to achieve

efficient and significant speech quality improvement, which may be very hard or even

impossible for single layer approaches.

1.4 Thesis Contributions

The contributions of this dissertation are listed hereafter:

We identify the impairment factors for perceived speech quality of Wireless VoIP

and specifically focus on the impact of ARQ mechanisms. We use an objective

measure of perceived conversational speech quality (MOSc) as a metric to

evaluate the performance of three current retransmission schemes including no

retransmission, Speech Property-Based (SPB) [21] retransmission and full

retransmission, while considering the impact of retransmission jitters. Our

findings indicate that the performance of the retransmission mechanisms is a

function of both wireless link quality and delay introduced in the wireline

network. And the SPB retransmission, which is supposed to protect only

perceptual important speech frames, may not achieve the expected performance

as it introduces two much jitters.

We propose a new perceived speech quality driven retransmission mechanism [22]



which may be used to improve speech quality for wireless VoIP (in terms of the

objective mean opinion score) by switching between No retransmission and Full

retransmission according to different communication conditions. Through

simulations, we show that the proposed method can achieve an optimum MOSc

compared to no retransmission, full retransmission and SPB retransmission, and it

can also achieve the similar retransmission efficiency as SPB retransmission

while avoid the implementation complexity to obtain speech property information

that is necessary for SPB retransmission

We propose a cross-layer design in which 1) retransmission procedure of the link

layer Automatic Repeat on request (ARQ) protocol is constrained by the available

delay budget estimated by the application level playout buffer. 2) If the

retransmission procedure is terminated prematurely, received noisy copies of a

speech packet are presented to application layer and finally played out. 3) In the

playout delay estimation, delivery delay in the wireless channel is estimated

separately and constrained to avoid delay accumulations in the transmitting queue.

The simulation results show that the perceptual speech quality of a wireless VoIP

system can be significantly enhanced, since retransmission delay, playout buffer

losses, queuing delay and losses are reduced by this design.

1.5 Organization of the Thesis

The rest of this dissertation is organized as follows. Chapter 2 provides an

introduction to some basic theories related to this project, such as speech quality

evaluation, adaptive playout buffer and Automatic Retransmission reQuest (ARQ)

protocol. In Chapter 3, we look at the impairment factors introduced by ARQ schemes,

and introduce a perceived speech quality driven retransmission scheme to achieve

optimum conversational speech quality. In Chapter 4, we consider problems

introduced by an ARQ protocol when it works with other components of a Wireless

VoIP system (e.g. transmitting queue, adaptive playout buffer) in the layered protocol

architecture, and propose a cross-layer design as a solution for the presented problems.

Finally, in Chapter 5 we discuss the research outcome of this project, and present

extensions and ideas for future works, a short conclusion is also presented to conclude

this thesis.



CHAPTER 2

BACKGROUND THEORIES

2.1 Speech Quality Evaluations

2.1.1 Objective Speech Quality Measurement

In voice communications, the mean opinion score (MOS) provides a numerical

measure of the quality of human speech at the receiving end. MOS indicates the

speech quality perceived by the listener and can range from 1 (bad) to 5 (excellent) as

presented in Table 2-1. There are number of measurements methods are available to

measure speech quality of a VoIP system. Basically, speech quality measurements can

be divided into two categories, subjective measurements and objective measurements.

Subjective speech quality measurement requires a large group of people involved to

attend the test. It is time consuming, unrepeatable and expensive. Compared with

subjective tests, objective tests are repeatable, automatic and do not suffer from

environment effects.

The most popular objective measurements are Perceptual Evaluation of

Speech Quality (PESQ) [23] and E-model [24]. PESQ is also categorized as a kind of

intrusive speech quality measurement, as it requires the original speech signal with the

degraded one to perform the quality evaluation. While E-model is categorized as one

of the non-intrusive speech quality measurement, as it is parameter-based and does

not require the help or original speech signal.



Quality Scale Score Listening Effort Scale

Excellent 5 No effort required

Good 4 No appreciable effort required

Fair 3 Moderate effort required

Poor 2 Considerable effort required

Bad 1 No meaning understood with reasonable effort

Table 2-1 MOS scale

2.1.2 PESQ

PESQ was specifically developed to be applicable to end-to-end voice quality

testing under real network conditions. The result of comparing the reference and

degraded signals is a quality score. The simplified system model of PESQ is given in

Figure 2-2. It consists of three key modules: time alignment module, perceptual

transform module and cognition/judgment module. The time alignment model

synchronized the degraded signal with the reference signal. The perceptual transform

module transforms the signal into a psychophysical representation that approximates

human perception. The cognition/judgment module maps the difference between

original (reference) signal and distorted (degraded) signal into estimated perceptual

distortion and then further mapped into Mean Opinion Score (MOS) scale.

The evaluated results given by PESQ have been calibrated using a large database

Figure 2-1 Basic Structure of Perceptual Evaluation of Speech Quality t

Time Alignment Model

Perceptual Transform Module

Original Speech Estimated

Distortion

Cognition/Judgment Module

Perceptual Transform Module

Distorted Speech



of subjective tests. PESQ takes into account signal degradation such as coding

distortions, errors, packet losses, delay and variable delay, and filtering with transfer

function equalization, time alignment, and a new algorithm for averaging distortions

over time. However, PESQ does not take into account the subjective effect of level

changes in the network, echo, and the effect of round-trip delay on conversation.

2.1.3 E-Model

The E-Model is a computational model, standardized by ITU-T in [24][27][28]. It

uses transmission parameters to predict the subjective speech quality of packtized

voice. E-Model has proven to be useful as a transmission-planning tool, for assessing

the combined effects of variations in several transmission parameters that affect

conversational1 quality of telephony [24]. The primary output from the EModel is the

"Rating Factor" R, and R can be further transformed to give estimates of customer

opinion by mapping it to the MOS scale.

The EModel Equation for “Rating Factor” is

AIIIRR esd +−−−= 0

This equation results in an R factor between 0 and 100. The components of R are:

R0, base R value (noise level); Id, representing the effects of impairments occurring

simultaneously with the speech signal; Is, representing the effects of impairments

occurring simultaneously with the speech signal; Ie, representing the effects of

"equipment” such as DCME or Voice over IP networks; A, the advantage factor, used

to compensate for the allowance users make for poor quality when given some

additional convenience (e.g. 0 for wireline and 10 for GSM)

Delay impairment Id

The Id factor models the quality degradation due to one-way or “mouth-to-ear”

delay. Id can be computed from the one-way delay as [29]:

)3.177()3.177(11.0024.0 −−+= aaad THTTI

where ⎪⎩

⎪⎨

⎧

≥=

<=

01)(

00)(

xifxH

xifxH

Ta represents one-way delay ( or “mouth-to-ear” delay) in milliseconds.



Equipment impairment Ie

The loss impairment Ie captures the distortion of the original voice signal due to

low-rate codec, and packet losses in both the network and the playout buffer.

Currently, the E-Model can only cope with speech distortion introduced by several

codecs i.e. G.729 or G.723.

Mapping R factor into MOS scale

We can map R into MOS scale by the following equations [24]:

MOS=1 if 0≤R 6107)100)(60(035.01 −×−−++= RRRRMOS if 1000 <≤ R

MOS =4.5 if 100≥R

2.1.4 Conversational speech quality evaluation

Trace data (loss) Degraded speech

Reference speech

Encoder Loss process Decoder

PESQ

IeMOS

Perceived speech quality during a VoIP conversation can be expressed as a

conversational Mean Opinion Score (MOSc). MOSc values can be obtained by

subjective listening tests or by objective evaluation methods, such as the EModel. As

described in Section 2.1.2, the E-Model consists of very complicated equations and is

not applicable to some impairment factors, such as some codecs or bit errors in the

payload. A prediction method for perceived conversational speech quality has been

Trace data (delay) Delay model

MOS->RE-Model Concepts

MOSc

Id

Figure 2-2 Schematic diagram for MOSc measurement



proposed in [29]. This schematic diagram of this new method is illustrated in Figure

2-3. In this method, MOS index produced by PESQ is firstly transformed to R scale

by

336.57060.87314.25026.3 23 −+−= xxxRpesq

where x represents MOS index from PESQ.

Then equipment impairment factor Ie can be computed as Ie=R0-Rpesq, with delay

impairment factor Id, we can get R scale value by R=R0-Id-Ie, finally get MOSc from

R according to the standard E-Model equations. Hence, the impairments of delay,

packet loss, coding and bit error can all be represented in the evaluated value of

MOSc.

2.2 Adaptive Playout Buffer

Playout buffer can be fixed or adaptive. In the fixed playout buffer, the playout

delay for a packet stream is preset before a conversation begins. So the fixed playout

buffer cannot readily adapt to the time-varying network conditions and may result in

poor speech quality. For this reason, adaptive playout buffer is considered. A lot of

works have been done in developing adaptive playout buffer algorithms to achieve the

best balance between playout delay and packet losses in playout buffer. Recent work

in addressing the problem specifically for the Internet can be found in

[30][31][32][33]. In this section, we briefly review some playout buffer algorithms

from these literatures. The details of applications of adaptive playout buffer in our

Wireless VoIP system can be found in Chapter 3, 4.

di

In [30], Ramjee et. al. proposed four algorithms (e.g. ‘exp-avg’, ‘fast-exp’, ‘min-

delay’ and ‘spk-delay’) to adjust playout delay according to estimated network delay

performance. These algorithms estimate mean and variation of network delay and id^

receiver

sender ti

ni ai pi

bi

Figure 2-3 Timing associated with packet i



iv^

on the arrival of the ith packet. The playout delay is adjusted at the beginning of

each talkspurt. Let ti be the timestamp of packet i which is the first packet in a

talkspurt, the playout time pi is computed as

iiii vdtp^^⋅++= µ

where µ is a constant. The playout time for the subsequent packets j in the same

talkspurt pj is computed as ijij ttpp −+= (see Figure 2-4 for the related timing

notations).

In these four algorithms is given by iv^

iiii ndabsvv )()1(^

1

^^−⋅−+⋅= − αα

But they differ in the computation of . id^

1) exponential-average (exp-avg): In this algorithm, the mean delay is estimated

through an exponentially weighted average [30]:

^

id

iii ndd ⋅−+⋅= )1(^^

αα

where means the one-way delay of iin th packet. The value of α is chosen to be

0.998002 in [30].

2) fast exponential-average (fast-exp): This algorithm is a modified version of exp-

avg. fast-exp computes the weighted mean of as [30]:

⎪⎪⎩

⎪⎪⎨

⎧

≤−+

>−+=

−−

−−

^

11

^

11

^

^

:)1(

:)1(

iiii

iiii

i

dnnada

dnndd

ββ

where α and β are constant values, satisfying 0 <α <β < 1. In [30] α = 0.998002

and β = 0.750000, this allows fast-exp adapt more quickly to increases in delays . in

3) minimum delay (min-delay) : This algorithm is more aggressive in minimizing

delays. It uses the minimum delay of all packets received in the current talkspurt. Let

Si be this set of delays [30]:

{ }jSji ndi∈= min

^

4) spike delay detection (spk-delay): This algorithm focuses on spike which represents



a sudden and large increase in delays over a sequence number of packets. spk-delay

usually obtains the playout delay usig the same equation as exp-avg, despiteα is set to

be 0.875 in [wan]. During spike, however, spk-delay uses the following

11

^^

−− −+= iiii nndd

to catch up the sudden increase of delays.

We also present here some more complex algorithms, which have been developed

based on the four classical algorithms described above.

5) window: This algorithm is proposed in [31]. It intends to detect spikes like spk-

delay. During a spike, the first packet in the spike is used as the playout delay. After

the spike, the playout delay is chosen by finding the delay corresponding to the qth

quantile of the distribution of the last N (10,000 in [31]) packets received by the

receiver.

6) adaptive: In [32], Sun et al had proposed an ‘adaptive’ algorithm to adapt to

different networks. The ‘adaptive’ algorithm switch between min-delay and fast-exp

depends on higher than a delay threshold (e.g.150ms) or not. id^

7) E-MOS: Fujimoto et al [33] proposed a playout buffer algorithm called E-MOS.

The E-MOS algorithm models the delay distribution with the Pareto distribution. The

Pareto distribution of delay is integrated with packet loss ratio in a function Q(d) to

model the impact of delay and packet loss on speech quality, which is represented by

MOS. Upon a packet is received, E-MOS uses the measured one-way delay to update

the Pareto distribution. Then, a optimal value of d is chosen as the playout delay if it

can maximize speech quality Q(d).

2.3 Automatic Repeat upon reQuest (ARQ)

Automatic Repeat reQuest (ARQ) is an error-control system in which a request for

re-transmission is generated by the receiver when an error in transmission is detected.

A very basic ARQ scheme includes only error detecting and retransmission

capabilities. If a packet is found to have errors after decoding, this packet is discarded

and a retransmission is requested to the source. The source then retransmits an exact

copy of that packet. This process may be repeated indefinitely, but normally an upper

bound in the number of retransmissions is set. If errors still persist after the maximum

number of allowed retransmissions is reached, higher layer will have to decide how



the situation is to be handled. For the retransmission procedures using ARQ, the three

most popular schemes are [16]:

Stop and Wait (SW)

In SW-ARQ, the sender, after delivering the first copy of a packet in its buffer, is

blocked until a positive acknowledgement (ACK) is received or the timeout is expired.

In the first case, sender drops the successful packet from the buffer and transmits next

packet, while in the second distance, sender simply retransmits the same packet.

Go Back N (GBN)

The sender continuously transmits packets stored in its buffer, until a Negative

ACK (NACK) is received. In this case, sender stops the transmission of a new packet,

pulls back to the packet erroneously received, and retransmits a complete sequence of

N packets, starting with NACKed packet, where N is the number of packets

transmitted within an average round trip time.

Selective Repeat (SR)

In this case sender continuously transmits packets stored in its buffer. Whenever a

NACK is received, sender stops the transmission of a new packet, pulls back to the

packet erroneously received, retransmits only it and begins the transmission of a new

packet. It is worth noticing that, in this case, the retransmission of successfully

received packet following the corrupted packet is avoided, thus allowing better

efficiency.



CHAPTER 3

PERCEIVED SPEECH QUALITY DRIVEN

RTRANSMISSION METCHANISM

3.1 Introduction

Quality of Service (QoS) support for voice over IP (VoIP) in wireless/mobile

networks is an important issue for technical and commercial reasons. However,

speech quality for VoIP suffers from high packet loss rates and other impairments in

the wireless link. Retransmission mechanisms, such as automatic repeat request

(ARQ), have been incorporated in wireless and cellular networks to retransmit lost

packets to improve performance in data transmission over wireless. In wireless

networks such as 802.11b [1], the retransmission mechanism is a simple Stop & Wait

algorithm and is implemented at the Media Access (MAC) layer, in which each

transmitted packet must be acknowledged before the next packet can be sent. If in a

certain timeout period an acknowledgement is not received by the sender of a frame,

the sender will retransmit the frame until a maximal retransmission limit is reached.

When the wireless link quality is poor, retransmission of MAC frames can effectively

recover corrupted packets that contain bit errors.

However, excessive delays may be introduced by retransmission schemes that

have significant adverse effects on real-time applications such as VoIP, which are



sensitive to delay. A simplex retransmission scheme always negatively affects

perceived speech quality in VoIP. There exists a tradeoff between packet loss and

delay in a variety of retransmission schemes. Improved retransmission mechanisms

such as Speech Property-Based ARQ (SPB-ARQ) [21] and Hybrid loss recovery

scheme [34] have been proposed to reduce speech distortions by protecting packets

that are perceptually more relevant. However, these schemes are only limited to

listening-only quality assessment of the effect of the retransmission schemes on

speech quality and do not consider the impact of delay which is important for

conversation and interactivity. Further, these schemes do not consider the impact of

retransmission jitters. Since adaptive jitter buffers would discard inappropriately

retransmitted packets, the character of retransmission jitters introduced by different

retransmission schemes should be considered.

The primary aim of the study reported is to investigate new retransmission

mechanisms to improve speech quality for wireless VoIP. In this study, we use a

perceived conversational speech quality assessment method [29] to evaluate the

performance of current retransmission mechanisms (No retransmission, Full

retransmission, SPB retransmission) instead of listening-only method or individual

network parameters (e.g. packet loss and delay). We also present a new retransmission

policy, which can adapt to the most suitable retransmission mechanism, depending on

the wireless link quality and network delay conditions. The ultimate aim of this

perceived speech quality driven policy is to achieve optimum speech quality (in terms

of the conversational Mean Opinion Score MOSc) in the face of network impairment

factors and wireless channel situations, while considering the coupling effect of

retransmission jitters and adaptive jitter buffers.

3.2 Related Works

3.2.1 Speech property-based retransmission mechanisms

Speech Property-Based QoS control schemes are based on the fact that some

voice frames are perceptually more important than others when encoded speech is

transferred through packet networks. Recent experimental results show [35], that in

some popular codecs used in wireless applications (e.g. AMR) the position of a frame

loss has a significant influence on the perceived speech quality. In such codecs, frame

loss concealment techniques are used to interpolate the parameters for the loss frames



from the parameters of the previous frames. Lost voice frames at the beginning of a

talkspurt will be concealed using the decoding information of previous unvoiced

frames. However, because voiced sounds always have a higher energy than unvoiced

sounds, concealment of these frames with unvoiced frames that have lower energy

will cause a serious degradation in speech quality. Moreover, at the unvoiced/voiced

transition stage, it is difficult for the decoder to correctly conceal the loss of voiced

frames using the filter coefficients and the excitation for an unvoiced sound,

especially when burst loss occurs or the frame size grows.

To maximize the perceptual quality at the receiving end, perceptually important

voice packets may be protected by giving them a high priory with the unimportant

packets handled as 'best-effort'. SPB retransmission, a retransmission scheme that

protects only the perceptual important speech frames, is presented in [21] [34].

Experimental results reported in [21] show that SPB retransmission could provides a

better speech quality (assessed by EMBSD) than No retransmission scheme, which do

not retransmit any packet. In [34], SPB retransmission was shown to be more efficient

in reducing retransmission delays than Full retransmission, which retransmits every

unacknowledged (unACKed) packet.

3.2.2 Measuring conversational speech quality

In previous studies [21][34], the assessment of retransmission schemes was

performed using the EMBSD algorithm, which only considers the distortion caused

by packet loss. However, in practice both packet loss and delay are crucial in voice

conversation and long retransmission delays (e.g. due to long network delay) would

seriously impact speech quality. The E-model is introduced by ITU as a non-intrusive

quality assessment method to obtain a measure of voice quality. Unfortunately, the E-

model is only applicable to a limited number of codecs, which at present does not

include the AMR codec. In our simulation, we employed the conversation MOS [29]

to qualify the performance of different retransmission schemes. In he conversation

speech quality evaluation (see Chapter 2), the ITU PESQ is firstly used to quantify the

impact of packet loss on speech quality. The result of this is then converted to the

equipment impairment Ie. The average end-to-end delay effect, Id, is then calculated.

The E-model is then used to obtain a measure of the speech quality, MOSc, based on

Ie and Id (see Figure 3-1).



3.2.3 Adaptive jitter buffer and retransmission jitters

In VoIP applications, jitters are compensated for in the receiver by a jitter buffer.

The size of a jitter buffer can be fixed or adjustable. Fixed jitter buffers cannot adapt

readily to changes in network delays and as a result are not practical in real VoIP

applications. In our study, we investigated fast-exp, one of the classical adaptive jitter

buffer algorithms proposed in [30]. By using a smaller weighting factor as delays

increase, the fast-exp algorithm can quickly adapt to the increases while avoiding

discarding of too many packets. It estimates the current mean network delay (denoted

as ) and current variance of network delay (denoted as ) when a packet arrives.

The mean delay estimation equation is given by:

^

id ^

iv

⎪⎪⎩

⎪⎪⎨

⎧

≤−+

>−+=

−−

−−

^

11

^

11

^

^

:)1(

:)1(

iiii

iiii

i

dnnada

dnndd

ββ

where is the network delay of the iin th packet, 75.0=β and 0.99802. The

following equation is used to estimate :

=a

^

iv iiii ndavav −−+ (= −

^

1

^)1

. At the beginning of

a talkspurt, adaptive jitter buffer changes the play out delay using the

equation: , where D is the play out delay and ^^

* ii vdD µ+= µ is a constant that

can be selected from 1 to 20. We set µ to be 4 in our simulation. It should be noted

that for VoIP over wireless, the network delay consists of delays introduced by the

wireline network and the wireless link. Jitters can be introduced by network

congestions in the wireline network or by retransmissions/propagations in the wireless

links. In view of the fact that most jitter buffer algorithms were proposed for

compensation of network congestion jitters, it should be valuable to investigate the

impact of retransmission jitters for VoIP over wireless

in

3.3 Simulation System Description



Our study is based on network simulator ns-2 [36], in which we simulated a last-

hop wireless scenario. Both of the IEEE 802.11 and the Ethernet protocol stack are

implemented in the simulator. A two way Bernoulli error model was inserted to

simulate the wireless link transmission errors. In 802.11, if the packet size exceeds

the Max. Transmission Unit (e.g. 1500 bytes for WaveLan) the packet will be

fragmented. Since we set the packet size to 71 bytes, a 12.2kbit rate AMR speech

frame for one RTP packet the impact of fragmentation is avoided.

The simulation system is given in Figure 3-1. In our simulation, the original

speech file is first encoded by the AMR codec and then analyzed to extract the speech

marking information (voiced/unvoiced) for each packet. The speech marking

information is used with network delay and wireless link quality to control the

retransmission policy. The error model determines whether a packet is corrupted or

not according to packet error probability (PER). The base station (BS) will neither

send an ACK to the sender for a corrupted packet nor present it to the high layer. If the

MAC layer of the sender has not received an acknowledgement for a packet, it will

retransmit the packet until the packet is ACKed or it reaches the limit of

retransmission attempts (we will denote Retransmission as Retx in the rest of this

Chapter). In our simulation, we set the Retx attempts limit to 6 for both SPB Retx and

Full Retx. In the receiver, the received speech packets are fed to an adaptive jitter

DegradedSpeech

AMR Decoder

Adaptive Playout Buffer

PESQ

EModel

RTP

UDP

IP

MAC

PHY

Fixed Host

RTP

UDP

IP

Ethernet

AMR Encoder

Speech Marking

Retx. Limit Control

Mobile HostOriginal Speech

Network Delay PER

Access Point

MOS/IeEnd-to-end MOScDelay (Id)

Speech Quality Evaluation

Figure 3-1 Simulation Environment



buffer and subsequently decoded to recover the degraded speech file that is used to

obtain a measure of speech quality.

In our study, we used combined PESQ and E-Model to evaluate the

conversational speech quality as described in Chapter 2. Performance index was

obtained averaging the computation results that were obtained from this method for

each 20 seconds of the speech file.

The following simulation results were obtained by averaging results of 50

simulations with different random seeds to avoid the impact of packet loss locations.

The three simulated retransmission schemes are SPB Retx, Full Retx and Null Retx.

TABLE 3-1 gives the average number of voiced packets losses of transmitting

73000 speech packets in our simulated wireless network with these schemes. For

simplicity, we only simulated the wireless link for the purpose of this study. And only

the wireless link (Retx limit exceeded) and the adaptive jitter buffer account for the

packet losses. In Table 3-1, most of the losses of voiced packets in Full Retx or SPB

Retx are caused by jitter buffer. As we deployed a Bernoulli error model in our

simulation, most of the retransmitted packets can be successfully received by the

receiver. If the bursty of packet errors is considered, there should be more losses of

voiced packets in Full Retx or SPB Retx scheme.

It

least t

MRes

Table.3-1- Average voiced packets losses with fast-exp playout buffer

Retx Scheme

PER

No

Retx

SPB

Retx

Full

Retx

0.0001 15 53 29

0.0005 36 54 27

0.0008 61 51 26

0.001 69 47 22

0.003 144 28 17

0.005 241 22 13

0.01 474 13 9

0.05 2344 42 16

0.10 4678 931 159

seems very straightforward that SPB Retx should be better than No Retx and at

he same as Full Retx with regard to the performance of protecting voiced frames.

Thesis –University of Plymouth 25


However, in TABLE 3-1, we can see that Full Retx always has less voiced packets

losses, while No Retx has the least lost voiced packets when link quality is good

(packet error probability lower than 0.0005). In fact, as in fast-exp algorithm, the

estimated playout delay will increase with the number of retransmission jitters

increases. When link quality is good, the estimated play out delay keeps at a low level,

occasionally retransmitted packets and packets adjacent to them would be discarded

by jitter buffer due to jitters they introduced. However, in No Retx scheme, a

corrupted packet doesn’t affect its following packets. That’s why it has least packet

losses when link quality is very good. On the other hand, in SPB Retx, unvoiced

packets are not retransmitted hence the estimated playout delay can’t reflect current

wireless link situations when link quality becomes worse. While in Full Retx, every

unACKed packets is retransmitted, this is helpful for the adaptive jitter buffer to

estimate the playout delay for the next talkspurt. That’s why the adaptive jitter buffer

discards more packets in SPB Retx than in Full Retx.

3.4 Performance Comparison of Current Retransmission Schemes

Figure 3-2 and Figure 3-3 give the overall packet loss rates and buffered

retransmission delay comparison. In Figure 2, we can see that Full Retx keeps the

packet loss rate at a low level at the expense of higher delay as plotted in Figure 3

because every unACKed packet is retransmitted. It’s very interesting that when link

quality is not too bad (packet error probability up to 0.01), packet loss rate of Full

Retx scheme is decreasing while link quality becoming worse. In fact, as we

mentioned before, in worse link quality, more retransmissions helps the jitter buffer to

estimate playout delay more accurately. However, when link quality is very good

(packet error probability up to 0.0005), No Retx can obtain the best packet loss rate

because it doesn’t introduce any jitter and few packets is corrupted due to bit errors.

As a compromised method, the packet loss rate and Retx delay of SPB Retx is

between No Retx and Full Retx.

Using the evaluation method described in Chapter 2, we give a more

straightforward performance comparison in Figure 4 and Figure 5 for these schemes

with MOSc as the metric. Our evaluation didn’t consider the packet losses introduced

in the wireline network hence to focus on the performance of Retx schemes. However,

we considered network delay in the evaluation. For natural hearing, delays lower than

100ms cannot really be appreciated, but delays above 150ms can obviously affect



10-4

10-3

10-2

10-1

100

10-2

10-1

100

101

102

Packet Error Probability

Loss

Rat

e (%

)

No RetxSPB RetxFull Retx

Figure 3-2 Overall packet loss rate comparison

10-4

10-3

10-2

10-1

100

0

50

100

150

200

250

300


Buf

fere

d R

etx

Del

ay (m

s)


Figure 3-3 Buffered retx delay comparison

100 120 140 160 180 200 220 240 260 280 3003.2

3.3

3.4

3.5

3.6

3.7

3.8

3.9

4

4.1

4.2

Network Delay

Perceived Quality DrivenNo RetxSPB RetxFull Retx

10-4 10-3 10-2 10-1 1001.5

2

2.5

3

3.5

4


MO

Sc


MO

Sc

Figure 3-5 MOSc comparison with packet error probability 0.001 Figure 3-4 MOSc comparison with 175ms

network delay



conversation interactivity [37]. Considering Retx delays rarely exceed 100ms, to

obviously reflect the impact of Retx delay, we assume 175ms delay had been

introduced in the wireline network and add it to the end-to-end delay in the MOSc

evaluation. In Figure 4, the MOSc of Full Retx is lower than No Retx and SPB Retx

when packet error probability is lower than 0.003. That’s because Full Retx scheme

always introduces more Retx delay, while the perceived speech quality is sensitive to

high delay when link quality is good. When packet error probability exceeds 0.003,

Full Retx scheme becomes the best, as it can greatly reduce the number of corrupted

packets. Figure 3-5 illustrates the performance comparison with different network

delays when packet error probability is 0.001. In Figure 3-5, we can see that when

delay lower than 150ms, Full Retx can get the best MOSc. When delay is higher than

150ms Null Retx becomes the best, it confirms that 150ms is the threshold above

which delay begins to have a severe impact on speech quality. Similar to Figure 4, the

performance of SPB is between No Retx and Full Retx, but it doesn’t become the best

in both sides of the delay threshold.

3.5 Perceived Speech Quality Driven Retransmission Scheme

Considering both No Retx and Full Retx schemes can achieve the best MOSc

under different link quality and network delay situations. We propose a new perceived

speech quality driven retransmission scheme, which can switch between these two

schemes when link quality and network delay changes. The pseudo code of the new

scheme is shown in Figure 3-6. Low_Error_Threshold is set to be 0.0005 and

High_Error_Threshold is 0.003. Since according the simulation results, when packet

error probability is lower than 0.0005, No Retx can achieve the best MOSc even delay

is not considered, whereas Full Retx becomes the best when packet error probability

exceed 0.003, even network delay is very high. When packet error probability is

between 0.0005 and 0.003, the decision should be made according to network delay.

In the proposed scheme, Delay_Threshold is set to be 150ms as it’s the threshold that

delay begin to obviously affect speech quality. In real applications, we can convert Bit

Error Rate (BER) to PER, and BER can be obtained according to bit errors in bit

pattern series sent from BS. Network delay can be estimated by deducting average

MH to BS handoff delay from average end-to-end delay that can be retrieved from

RTP packet header.



The performance of the new perceived speech driven scheme is also given in

Figure 3-4 and Figure 3-5 under different network delay and packet error probability.

We can see that the curve of the perceived quality driven scheme is overlapped with

parts of No Retx and Full Retx when they achieve best MOSc. As it can switch to the

more suitable scheme between No Retx and Full Retx when communication

conditions changes. Since this method only uses Full Retx when it’s necessary, it can

also achieve the similar retransmission efficiency as SPB Retx while avoid the

implementation complexity to obtain speech property information that is necessary for

SPB Retx.

Figure 3-6 Perceived speech quality driven Retx scheme pseudo code

if (PER < Low_Error_Threshold) . No_Retx();

else if (PER>High_Error_Threshold) Full_Retx();

else { if(Network_Delay<Delay_Threshold)

Full_Retx(); else No_Retx();

}

3.6 Summary

A suitable retransmission scheme is crucial for obtaining the best possible

perceived speech quality in wireless VoIP applications. In this Chapter, we

investigated the performance of three different retransmission schemes (No Retx, SPB

Retx, Full Retx) with regard to the perceived conversational speech quality. The

impact of retransmission jitters with an adaptive jitter buffer was also considered. The

simulation results show that the performance of these schemes depends on the

network delay and wireless link quality. Considering that the wireless environment is

variable, we have proposed a perceived speech quality driven retransmission scheme

that can adapt to the wireless link quality and network delay conditions. As the SPB

Retx is not involved in the new method, the implementation complexity for retrieving

speech property information is avoided. Our results show that the proposed method

can achieve an optimum MOSc compared to No Retx, Full Retx and SPB Retx. Since

the most suitable scheme is deployed by the new method when communication

conditions change. In the study, a simplified last hop wireless network is



implemented to demonstrate wireless voice over IP scenario. Further improvements

may be achieved by making the simulation closer to real network, e.g. by

incorporating a multi-state error model in the wireless link.



CHPAPTER 4

PLAYOUT DELAY CONSTRAINED ARQ

and ARQ AWARE PLAYOUT BUFFER

4.1 Introduction

Due to the unreliable and error-prone features of wireless channels, assuring

acceptable perceived speech quality has been a challenging task for Wireless VoIP.

Automatic Repeat on reQuest (ARQ) is one of the packet error recovery techniques

for Wireless VoIP and may be a complement or substitute for Forward Error

Correction (FEC) because of its efficiency and simplicity.

n ACKn ACKn+1

n+1

Tx Queue

Rx Buffer

Wireless Channel

Timer Started

Frame Loss

n+1

Timeout

TimerStopped

Backoff

TimerRestarted

TimerStopped

Figure 4-1 Stop and Wait ARQ



In ARQ, the sender sends packets or Protocol Data Units (PDUs) consisting of

payload and checksums. According to the result of checksum validation, the receiver

sends back acknowledgment messages (e.g. ACK or NACK) to the transmitter. The

sender performs packet retransmissions based on such acknowledgments. Basically,

ARQ protocols can be categorized as three types: Stop-and-Wait (SW), Go-Back-N

(GBN) and Selective Repeat (SR), which are differed in the way of responding to

acknowledgments. The details of these three types of ARQ have been described in

Chapter 2. In this study, we consider the SW-ARQ in IEEE 802.11 Media Access

Control (MAC) Layer [1]. In the 802.11 SW-ARQ, the transmitted packet must be

acknowledged before the next packet can be sent. If in a certain timeout period an

acknowledgement for a packet is not received by the sender, the sender will retransmit

this packet until a maximal retry limit is reached. In the Distributed coordination

function (DCF) Mode of IEEE 802.11, there is a Backoff procedure to randomly defer

each retransmission hence to avoid collisions of multiple transmitters (see Figure 4-1).

With this procedure, corrupted packets may be recovered by the retransmitted copies.

However, ARQ schemes also bring a series of problems impacting the perceived

speech quality. The retransmission procedure may introduce excessive delays, when

the packets have to traverse a high delay wireline network before it reach the wireless

part, any retransmissions may considered unnecessary [22]. Number of retransmission

attempts may vary according to wireless channel quality, this leads to retransmission

jitter.

Further, the layered protocol architecture, which puts ARQ and the playout buffer

works in different layer, makes things go from bad to worse. Firstly, if an adaptive

playout buffer is employed in the Wireless VoIP system, a packet’s delay budget -

playout delay is decided at the beginning of each talkspurt. Since the retransmission

procedure is only constrained by a fixed maximum retry limit, high retry limit that

exceeds available delay budget may lead to unnecessary retransmissions and postpone

subsequent packets, while low retry limit may terminate retransmission procedure

prematurely with enough delay budget left. Secondly, considering a transmitting

queue exists in the sender, a high mean retransmission delay can make incoming

packets accumulate in the queue and queuing delay or losses quickly climb up.

Thirdly, in current protocol stack, packets that failed in transport or link layer

checksum validations are discarded, despite noisy voice packets may be considered

useful at the upper layer [38].



These problems have been addressed in some previous works. In [39][40][41],

the retransmission procedure is still constrained by a fixed maximum retry limit, but it

can be terminated at a packet’s deadline (e.g. presentation time). Nevertheless, these

works still cannot avoid the prematurely terminating of a retransmission procedure

when there is still some delay budget left for more retry attempts, and did not consider

the impact of retransmission delays on queuing delays or losses.

In [15] UDP-Lite, a modified UDP protocol with partial checksum, has been

developed to allow corrupted UDP packet to be reused at application level. However,

for Wireless VoIP the MAC layer checksums should be modified as partial as well.

Otherwise, noisy packets would have been discarded in MAC layer and never reached

upper layers.

We extended these ideas in a cross-layer design for Wireless VoIP, where the

retransmission procedure is only incorporated in local channel. In our design, link

layer ARQ and playout buffer cooperate in an integrated framework, in which 1)

retransmission procedure of a packet is constrained in the available delay budget. 2)

Speech data is not covered in the checksum of link layer or transport layer packets.

And a packet combining process is performed to get a least noisy packet from its

retransmitted copies. 3) Estimates delivery delay in the wireless channel separately

and limits it in the mean inter-arrival delay of the transmitting queue. Simulation

results show that with the help of this design, the simulated Wireless VoIP system

gained considerable performance improvement, at the expense of breaking the layered

protocol architecture.

4.2 The Cross-Layer Design

PLAYOUT TIMEPLAYOUT BUFFER


RTP

UTP

IP

ETHERNET

RTP

UTP

802.11 MAC

PHY

IP

FIXED HOST MOBILE HOST

Figure 4-2 the Cross-layer design system model

c b a

INCOMING QUEUE

ACESS POINT

b a3 a2 a1 To DECODER

PACKET COMBINNING

Retransmission Terminated


4.2.1 System model

The system model of the proposed cross-layer design is described in Figure 4-2.

We considered the last-hop scenario in an IEEE 802.11 wireless network. Our design

is composed of two correlated components: playout delay constraint ARQ, in which

playout delays become the stop criterion of the retransmission procedure; ARQ aware

playout buffer, which calculates packet delivery delay for the wireline and wireless

part respectively and constrains the wireless channel delay budget under the arriving

interval of incoming packets hence to avoid accumulations of queuing delay.

As speech data is not covered by the link layer and transport layer checksums, the

playout buffer may receive several noisy versions of a packet. In case of the packet’s

correct version hasn’t been received at its presentation time, we employed the

Majority-Logic packet combining [44] to further reduce the damaged part and then

sent a combined version to the decoder. Details of this technique are presented in

Appendix D.

The two key components of the cross-layer design are described in the following

subsections.

4.2.2 Playout delay constrained ARQ

Corrupted?

Present toupper layer

Y

N

Send ACK

Playouttime?

N

Check recei vedcopies of thepl ayout packet

Exist a correctversion?

Y

N

Send to Decoder

Mul ti -l ogicalpacket combi ni ng

Y

Appliation & LinkLayer Interface

Wait forpacket

retransmission

Terminate currentretransmission

process

Received Apacket?

Y

N

Figure 4-3 Block diagram of the playout delay constrained ARQ with packet combining

The playout delay constraint ARQ is a specific optimization of current protocol

stack for Wireless VoIP. The block diagram of the playout delay constraint ARQ is



given in Figure 4-3. In the receiver, the 802.11 MAC layer presents every received

packet to the upper layer, whether it’s corrupted or not. In the application layer, the

playout buffer can terminate a packet’s retransmission procedure at its playout time

hence to avoid unnecessary retransmissions. If a corrupted packet hasn’t been

recovered by the retransmission procedure, the received noisy copies are combined

together by the packet combining module to get a more reliable version, which is then

decoded and played out.

We still keep the maximum retry limit in the 802.11 SW-ARQ, but it is set to be

high

.2.3 ARQ aware playout buffer

4.2.3.1 Queue model

per flow transmission queue at the sender with a large enough

que

enough so as to avoid prematurely terminating of retransmission procedure when

there is still delay budget left for more retry attempts. To allow corrupted packets to

be presented from link layer to application layer, the link layer and transport layer

checksums have to be modified as partial (e.g. UDP-Lite). And the mechanisms that

eliminating duplicate PDUs should be turned off for the supported VoIP services.

Further application level checksum such as CRC in RTP packet should be enabled

hence the application layer can detect correct packets from several copies.

4

Assume there is a

ue length, so the queue losses can be ignored and we can focus on the queuing

delay. With the IEEE 802.11 SW-ARQ, the transmission queue can be seen as an

M/M/1 queuing system with Poisson distribution of packets arrivals and exponential

distribution of packets departures [45]. Let α be the average inter-arrival delay and s

the average packets departure delay. We have a1

=λ, s

1=µ

where λ and µ are

the mean arrival rate and mean service rate. The queu can

be computed as

mean waiting delay in the e

sasaTQ

−⋅

=−

=λµ

1

We can deduce that when as → , ∞→TQ

ained

which means if the mean delivery

delay in the wireless channel is not constr under the mean inter-arrival delay of

incoming packets, TQ will quickly climb up.



4.2.3.2 ARQ aware playout buffer

For Wireless VoIP, the network delay is composed of delivery delays in wireline

and wireless part. In our design, besides adjusting playout delay for each talkspurt, the

ARQ aware Playout Buffer is able to estimate required delivery delay in the wireless

and wireline part separately. Figure 4-4 gives the timing notations associated with the

playout buffer algorithm.

Since every noisy copy produced in the retransmission procedure was not

discarded, there may be several copies of a packet exist in the playout buffer. Let ai be

the receiver timestamp of the first arrived copy of ith packet, and ti be the sender

timestamp. We can compute delivery delay in wireline network for packet i (denoted

by nwi) as iii tanw −= . Let ri be the receiver timestamp of the last arrived copy. The

delivery delay in wireless channel of packet i (denoted by nci) can be computed as

. If no retransmission required for packet i,iii arnc −= ii ar = . However, recall that the

waiting delay in the transmission queue will quickly climb up if the mean delivery

delay in the wireless channel higher than the mean inter-arrival delay of the incoming

packets (denoted by iσ ). The playout buffer should be able to limit nci

under iσ when iii ar σ≥− . iσ can be estimated as:

)()1( 11 −− −⋅−+⋅= iiiii abs σσασασ

whereα is the same constant as used in the estimation of whereiv^

α is the same

constant as used in the estimation of and it is set to be 0.99802 in the simulation. iv^

The computing formula for network delay ni can be summarized as:

Receiver

Access Point

Senderti

nwi nci ni

ai ri

id^

Retry Attempts

Figure 4-4 Timing associated with Packet



⎪⎩

⎪⎨

⎧

=≥−+<−−+

=+=

iii

iiiii

iiiiii

iii

arnwarnw

ararnwncnwn σσ

σ

The ARQ aware playout buffer is only differed with other algorithms in the way of

computing network delay ni, We can estimate mean network delay according to

present algorithms, e.g. the ‘adaptive’ algorithm proposed in. [35]:

id^

)_(^

thresholddelaydif i ≥ { }jSji ndi∈= min

^.

⎪⎪⎩

⎪⎪⎨

⎧

≤−+

>−+=

−−

−−

^

11

^

11

^

^

)1(

)1(

iiii

iiii

i

dnnada

dnnddelse

ββ

Details of this algorithm can be found in Chapter 2.

4.3 Simulation Model and Experimental Results

As presented in Figure 4-5, the simulation model is comprised of the following

components: a voice traffic model, AMR encoder and decoder, a playout buffer, and a

wireless network simulator that integrated the 802.11 SW-ARQ and a simple

Bernoulli bit error model.

Wireless Network

Simulator

PlayoutBuffer

4.3.1 Wireless channel model

We employed a simple Bernoulli model for bit errors, which lead to packet

corruptions in the payload and the packet header. The probability of PHY layer

packets corrupted by bit errors PER can be computed as follows: plphBERPER +−−= )1(1

Voice Traffic

Decoder Encoder

Conversational Speech Quality

End-to-end Delay

Evaluation

MOSc

Figure 4-5 the Simulation Model



where BER is the Bit Error Rate and ph is the packet overhead size from physical

level. For our simulations we have used a value of 784 bits for ph: 24, 34, 20, 8, 12

bytes at the PHY, MAC, IP, UDP and RTP layer respectively (no header compression

is used). pl is the payload size, which is set to be 32 bytes corresponding to an AMR

12.2K voice frame.

Let ω denote the estimated playout delay, and be the corresponding

maximum retry limit constrained by

ϖR

ω . We can also compute the probability of a

packet being recovered after times of retransmissions PKR as ϖR

)1(1 PERPERPKR R −⋅= −ϖ

And the probability of the bit errors happen in the packet header PHE can be given by:

11)1(1+

⋅−−=pl

BERPHE ph

If a packet always contains bit errors in its header in R times of retransmissions,

the speech data carried by this packet can not be reused. The probability of this event

PLS is:

ϖRPHEPLS =

4.3.2 Voice traffic model

The voice traffic model can be simply represented by the on-off model [48]. In

the on-off model a two-state chain is assumed, one corresponds to the talkspurt and

one for the silence periods. The holding time in the two states is assumed to follow an

exponential distribution. In our simulation we selected a mean of 1.0 sec and 1.5 sec

for talkspurt state and silence state respectively as suggested in [49]

4.3.3 Speech quality evaluation

In our simulation model, we employed the conversational speech quality

evaluation method [29] to qualify the performance of different simulation strategies.

This method combined PESQ and E-Model to measure the perceived speech quality,

the results is represented by MOSc (Conversational Mean Opinion Score). The details

of this method can be found in Chapter 2. In this method, the impact of bit errors in

the payload, packet losses and delay all contribute to the degradation of final

evaluated speech quality.



4.3.4 Simulation results and analysis

We considered three strategies in the simulation study: Strategy A. SW-ARQ and

‘adaptive’ playout buffer without the proposed cross-layer design, Strategy B. playout

delay constrained ARQ with ‘adaptive’ playout buffer, and Strategy C playout delay

constrained ARQ with ARQ aware playout buffer. The simulation results were

obtained by averaging results of 30 trials with different random seeds to avoid the

impact of packet loss or bit error locations. Each trial continued for 200 seconds

corresponding to 10,000 PDUs (one PDU encapsulated one RTP packet).

Figure 4-6 shows the overall packet loss ratio comparison for these strategies.

When BER increases, Strategy A discard many corrupted packets that can not be fully

recovered before their playout time. Strategy B and C are the same policy regarding

packet losses. Both of them reuse noisy packets and only discard those packets that

cannot reach the receiver before their playout time. The result is that Strategy B and C

only discard a small percentage of packets compared to Strategy A, even when the

wireless channel is very noisy.



10-4 10-3100

150

200

250

300

350

BER

End

-to-e

nd d

elay

(ms)

inter-arrvial delay: 26msinter-arrvial delay: 28msiinter-arrvial delay: 30ms

10-4 10-30

10

20

30

40

50

60

70

BER

Pac

ket l

oss

Rat

io (%

)

Strategy AStrategy BStrategy C

Figure 7 End-to-end delay VS inter-arrival delay in Strategy A Figure 6 overall packet losses

Figure 9 conversational MOS comparison

10-4 10-30.5

1

1.5

2

2.5

3

3.5

4

4.5

BER

Con

vers

atio

nal M

OS


Figure 8 end-to-end delay comparison

10-4 10-3100

150

200

250

300

350

BER

End

-to-e

nd d

elay

(ms)




We also plotted end-to-end delays under different inter-arrival delays and wireless

channel conditions in Figure 4-7 and 4-8 with a fixed 100ms delay in the wireline

network. In Figure 4-7, the delay curves begin to spread at BER . The curve

for the shortest inter-arrival delay (26ms) increases fastest. It reflects the queue model

that the closer between the mean inter-arrival delay and the delivery delay in the

wireless channel, the higher the queuing delays or the end-to-end delays.

4105 −×

In Figure 4-8, we can see that the end-to-end delays of these strategies are

climbing with the increasing of BER. Strategy B performs slightly better than Strategy

A when BER become worse, as Strategy B has the capacity to terminate unnecessary

retransmission. Strategy C outperforms Strategy A and B with a more stable curve, as

it managed to avoid queuing delay accumulations. It should be noted that the delay

curves decreased at some points where the ‘adaptive’ playout buffer switches to the

‘min-delay’ algorithm more frequently.

The performance enhancement achieved by the cross-layer design in terms of

conversational Mean Opinion Score (MOSc) are presented in Figure 4-9. From Figure

4-9, we can see that the curve of Strategy A and B deceases significantly after BER 10-

4. At a BER of around 10-3, Strategy A already reaches 1.0, which is the worst MOSc.

On the contrary, Strategy C, or the cross-layer design, still achieves MOSc 3.0 at the

same BER.

4.4 Summary

We investigated problems introduced by the IEEE 802.11 SW-ARQ protocol

when it works with other components of a Wireless VoIP system (e.g. transmitting

queue, adaptive playout buffer) in the layered protocol architecture, and propose a

cross-layer design as a solution for the presented problems. The proposed cross-layer

design is composed of two correlated components: 1) playout delay constrained ARQ,

in which a packet’s playout time is the deadline of its retransmission procedure, and

instead of simply discard corrupted packets, noisy copies of a packet can be combined

and then played out. 2) ARQ aware playout buffer, in which requirements for the

delivery delay in wireless channel (e.g. not to advocate queuing delay) is considered

in playout delay estimation. Through simulations, we show that the proposed cross-

layer design can improve the perceived speech quality of a Wireless VoIP system in

terms of conversational Mean Opinion Score (MOSc). In our simulation, the wireless



channel errors are represented by a simple Bernoulli error model. Further

improvements may be achieved by making use of multi-state error models to simulate

transmission errors in wireless channel.



CHAPTER 5

DISCUSSIONS, SUGGESTIONS for

FURTHER WORKS, and CONCLUSIONS

5.1 Discussions

So far, based on the research works we have done in this study, we can discuss the

research questions raised at the beginning of this dissertation.

a. What are the impairment factors of Wireless VoIP applications?

For VoIP, the impairment factors have been concluded as packet loss, delay, jitter

and coding. Besides these impairment factors, for Wireless VoIP, bit errors can be

concluded as another impairment factor. If the whole packet carrying speech data is

covered by checksums (UDP checksum or MAC checksum), the effect of bit errors

perceived at the application level is also packet loss. However, if we applied a partial

checksum to cover only the packet header, the effect of bit errors can be packet loss or

speech distortion, depends on the positions of bit errors are inside the packet header or

payload

b. What are the pros and cons of ARQ mechanisms? Is the performance of Wireless

VoIP System improved by ARQ mechanisms in terms of perceived speech quality?



Compared to FEC that requires extra overhead, ARQ is a simple and efficient way

to recover damaged packets. The main problems introduced by ARQ schemes are

retransmission delay and jitter. Normally, the perceived speech quality of a Wireless

VoIP system can be significantly enhanced by using variation of ARQ schemes,

except some cases, e.g. low BER and high wireline network delay. But the use of

ARQ schemes should be constrained in a certain level, i.e. constrain the delay for

retransmission procedure in the playout delay or inter-arrival delay of the transmitting

queue at the access point.

c. How to optimize current ARQ schemes to improve speech quality? And how to

mapping real-time network and wireless channel QoS parameters into ARQ protocol

optimization?

ARQ schemes can be optimized to achieve retransmission efficiency, e.g. only

retransmitting import speech packets in SPB ARQ or switching between No

Retransmission and Full Retransmission in the proposed perceived speech quality

driven scheme. Another optimized version of ARQ is playout delay constrained ARQ,

which can terminate retransmission procedure of ARQ whenever necessary. All these

optimizations need QoS parameters to make decisions. The QoS parameters may be

obtained from other layers, namely, playout delay from application layer, wireless

channel performance from physical layer and other information from joint-layer

analysis.

d. What are the effects of the interactions between ARQ mechanisms with other

components of the Wireless VoIP system? How to cope with these effects if they are

negative?

One example of interactions between ARQ and other components of the Wireless

VoIP system is the effect of playout buffer on ARQ. If the retransmission procedure is

only constrained by a fixed maximum retry limit, retransmission procedure with high

retry limit may exceed available delay budget, leading to unnecessary retransmissions

and subsequent packets postponed, with low retry limit retransmission procedure may

be terminated prematurely before it reach the playout time. The corresponding

solution is the proposed playout delay constrained ARQ, for which the retry limit is

the estimated playout delay.



e. How to make use other packet error concealment technologies with ARQ? Or how

to use ARQ as a complement mechanism for other packet error concealment

technologies?

Using ARQ as a complement mechanism for other packet error recovery

techniques, e.g. FEC, has been addressed in previous works. In this study, we

investigated the performance of a cross-layer design, which incorporated ARQ,

majority-logical packet combining and partial checksum. The several noisy copies,

which were produced from the retransmission procedure of ARQ, can result in a least

noisy copy through a packet combining process. We conclude that a hybrid packet

recovery solution can achieve better performance gain than a single one, provided

appropriate scheduling among available packet error concealment techniques..

f. How to establish a cross-layer framework in which we can optimize the QoS

techniques located in different layer with a joint-layer analysis? And how to establish

a profile of real-time predicted speech quality and QoS parameters collected from

different layers and eventually make this profile become the scheduler of a cross-layer

framework?

In this study, we have achieved considerable improvement of speech quality by

simply adapting QoS parameters into the optimization of ARQ schemes with joint-

layer analysis. More works left for future studies to establish a perceived speech

quality driven cross-layer framework, in which QoS parameters from different layers,

evaluated speech quality feed back from the receiver and the Service Level

Agreement (SLA) are contributed to the decisions about using which packet error

recovery techniques and how to combined them together in a inter-cognizing way.

5.2 Suggestions for Further Works

In this study, the packet error recovery techniques of the cross-layer designs are

driven by network parameters. In future works, we plan to improve the performance

of cross-layer designs by establishing a more sophisticated perceived speech quality

driven close-loop packet error recovery scheduler. By close-loop, we mean the effects

of the strategy issued by the cross-layer design can be feedback and contribute to the

next phase of strategy-making.

In fact, we expect the perceived speech quality driven packet error recovery



scheduler have the following abilities: 1) collect QoS parameters (e.g. BER, end-to-

end delay, packet loss, and bandwidth) from different layers to form a profile of

current communication environment; 2) considering the performance feedback, the

situations of current communication environment and the users’ requirement (e.g.

SLA), produce an optimized packet error recovery strategy; 3) according to the

decided strategy, packet error recovery techniques are scheduled and several

techniques may be used at the same time, e.g. link layer ARQ and application level

FEC; 4) speech quality is evaluated periodically and sent back with other QoS

parameters to the Scheduler as the input of strategy-making.

Figure 5-1 illustrated the block diagram of the perceived speech quality driven

packet error recovery scheduler. The Scheduler will be composed of three key

components:

Packet error recovery scheduler: The central part of the framework is a real-time

scheduler located in the mobile host or wireless handset. The scheduler takes into

consideration variations due to channel error rate, overall packet loss rate, speech

quality feedback etc. and tries to produce an optimal packet error recovery strategy for

local wireless channel to maximize the perceived speech quality with available

resource. The packet error recovery strategy may address the problem about which

packet error recovery technique should be scheduled, FEC, ARQ, low coding rate or

hybrid? The specification for a specific technique can be provided as well, e.g. coding

rate, delay budget for ARQ.

DegradedSpeech Adaptive

Playout Buffer

RTP

UDP

IP

MAC

PHY

RTP

Fixed HostSpeech Source Mobile Host

UDP

IP

Ethernet

Decoder

Figure 5-1 Perceived speech quality driven packet error recovery scheduler

Access Point

Encoder

BoosterPacket Error

Recovery Scheduler

Perceived Speech Quality

Evaluation

Error recovery strategy

Feedback (MOSc, end-to-end delay, etc.)

QoS parametersFEC

ARQ

...



Booster: a Booster will be patched in the access point (AP). The Booster will have the

capacity of per flow service differentiation and admission control in the distributed

coordinated function (DCF). The Booster will also cooperate with the Scheduler to

differentiate wireless channel delivery delay from network delay or wireless channel

packet losses from network congestion losses. Further, the Booster can be designed to

change its QoS policies according to the packet error recovery strategy issued by the

scheduler.

Perceived Speech quality evaluation and feedback: the perceived speech quality

evaluation module is located in the receiver. This module will evaluate perceived

speech quality at a specified interval and send back the results, normally the

conversational Mean Opinion Score, with other QoS parameters such as end-to-end

delay to the sender. Such feedback information can be carried by the RTCP report or

other forms of in-band signaling. It should be noted that Figure 5-1 only gives the

scenario of a Mobile Host sending out speech traffic. In fact, the perceived speech

quality evaluation module should be also located in the Mobile host itself, and feed

evaluated quality indexes to local Scheduler in the case of the Mobile host is receiving

speech traffic in a conversation,

Besides these functionality considerations, more details about implementation

complexity, resource requirement etc. will be considered as well.

5.3 Conclusions

Perceived speech quality is crucial for the success of Wireless VoIP, a typical

application in the up coming wireless Internet or “4G”. The impairment factors to the

perceived speech quality of a Wireless VoIP system can be summarized as packet loss,

end-to-end delay, jitter, bit error and coding. In this study, we investigated the

problems introduced by ARQ schemes with regard to the perceived speech quality. We

tried to optimize current ARQ protocol by mapping cross-layer QoS parameters into

the scheduling and configuration of the retransmission procedure in ARQ. We

proposed a perceived speech quality driven retransmission scheme, which can switch

to the most suitable retransmission schemes according to QoS parameters reported

from lower or upper layer. We also developed a cross-layer framework, in which the



retransmission procedure of the ARQ protocol is determined by the available playout

delay and the delivery delay in the wireless channel is constrained in the network

delay estimation. Through simulation results, we showed that these cross-layer

techniques can achieve significant performance gains. But the works have been done

are far from perfect, towards an integrated perceived speech quality driven cross-layer

framework, more effort are required in future studies.



REFERENCES

[1] IEEE Standards Department, IEEE 802.11 Standard for Wireless LAN, Medium

Access Control (MAC) and Physical Layer (PHY) Specification, 1999

[2] 3GPP2 C.S0001-B, Introduction to cdma2000 Spread Spectrum Systems, MAY

2002

[3] Schulzrinne H., Castner S., Frederick R and Jacobson V.,RFC 1889: RTP: a

Transport Protocol for Real-time Applications, 1996

[4] Tanenbaum A.S. Computer networks, Prentice-Hall, 1996, ISBN 0-13-394248-1

[5] Thomas J.Kostas et al, Real-Time Voice Over Packet-Switchced Networks, IEEE

Network, 12(1): 18-27, January, 1998

[6] 3GPP TS.26090: Mandatory Speech Codec speech processing functions AMR

speech Codec; Transcoding functions, DEC 1999

[7] 3GPP2 C.S0014-0: Enhanced Variable Rate Codec (EVRC), JAN 1997

[8] S Rudkin, A Grace and M W Whybray, Real-time application on the Internet,

British Telecom Technology Journal,Vol 15,No2,April 1997.

[9] Agilent Technologies, Web ProForum Tutorials, Voice Quality (VQ) in

Converging Telephony and IP networks, http://www.iec.org/tutorials/voqual.pdf

[10] M. Veeraraghavan, N. Cocker, and T. Moors, Support of voice services in IEEE

802.11 wireless LANs, Proc. Infocom, Anchorage, Alaska, April 2001

[11] M. Luby, L. Vicisano, J. Gemmell, L. Rizzo, M. Handley and J. Crowcroft,

RFC3453: The Use of Forward Error Correction (FEC) in Reliable Multicast, DEC

2002

[12] Moo Young Kim, Renat Vafin, Packet-Loss Recovery Techniques for VoIP,

Technical Report, Royal Institute of Technology (KTH), Sweden

[13] Wenyu Jiang, Henning Schulzrinne, Comparison and Optimization of Packet

Loss Repair Methods on VoIP Perceived Quality under Bursty Loss , NOSSDAV 2002

[14] C. S. Perkins, O. Hodson and V. Hardman, A Survey of Packet-Loss Recovery

Techniques for Streaming Audio, IEEE Network Magazine, SEP/OCT 1998.

[15] L. A. Larzon, M. Degermark, and S. Pink, “The UDP Lite Protocol,” Internet

Draft draft-ietf-tsvwg-udp-lite-00.txt, Jan. 2002.

[16] Leon-Garcia and Widjaja, Communication Networks: Fundamental Concepts and



Key Architectures, McGraw-Hill, 2000, ISBN 0070228396

[17] Qian Zhang, Wenwu Zhu, and Ya-Qin Zhang, A Cross-layer Qos-Supporting

Framework for Multimedia Delivery over Wireless Internet, Proc. 12th Packet Video

Workshop (PV2002), Pittsburgh, USA, 2002

[18] Sanjay Shakkottai, Theodore S. Rappaport and Peter C. Karlsson, Cross-layer

Design for Wireless Networks, Technical Report Submitted for Journal Publication,

2003

[19] S. Krishnamachari, M.V. D. Schaar, S. Chor and X. Xu,Video Streaming over

Wireless LANs: A Cross-layer Approach, Proc. Packet Video, Nantes, France, APR

2003

[20] Yo Huh, Ming Hu, Martin Reisslein, and Junshan Zhang, MAI-JSQ: A Cross-

Layer Design for Real-Time Video Streaming in Wireless Networks, Technical

Report Telecommunications Research Center, Dept. of Electrical Eng., Arizona State

University, AUG 2002.

[21] H Sanneck, N Tuong L Le et al, Selective Packet Prioritization for Wireless

Voice over IP, 4th Int Sym Wireless Personal Multimedia Communication, Denmark,

2001

[22] Z.Li, L.Sun, Z.Qiao and E.Ifeachor, Perceived Speech Quality Driven

Retransmission Mechanism for Wireless VoIP, Proc. IEE 3G 2003 pp395-399, London,

UK, JUN 2003

[23] ITU-T Recommendation P.862, Perceptual evaluation of speech quality (PESQ),

an objective method for end-to-end speech quality assessment of narrowband

telephone networks and speech codecs.

[24] ITU-T Recommendation G.107 (05/2000), The E-model, a computational model

for use in transmission planning.

[25] ITU-T Recommendation P.830, Subjective Performance Assessment of

Telephone-band and Wideband Digital Codes.

[26] Athina. P. Markopoulou, Access the Quality of Multimedia Communication over

Internet Backbone Networks, PHD thesis, Department of Electronical Engineering,

Stanford University, USA, OCT 2002

[27] ITU-T Recommendation G..108, Application of the Emodel: a planning guide,

SEP 1998

[28] ITU-T Recommendation G.113, Transmission impairments due to speech

processing, FEB 2001



[29] Lingfen Sun and Emmanuel Ifeachor, "New Methods for Voice Quality

Evaluation for IP Networks", Proc. of 18th International Teletraffic Congress (ITC18),

Berlin, Germany, SEP 2003

[30] R.Ramachandran, J.Kurose, D.Towsley and H.Schulzrinne, 1994, Adaptive

playout mechanisms for packetized audio applications in wide-area networks, Proc. of

IEEE Inforcom, vol.2, pp.680-688

[31] S. B. Moon, J. Kurose, and D. Towsley, “Packet audio playout delay adjustment:

performance bounds and algorithms,” ACM/Springer Multimedia Systems, vol. 5, pp.

17–28, JAN 1998.

[32] L Sun, E.C.Ifeachor, 2003, Prediction of Perceived Conversational Speech

Quality and Effects of Playout Buffer Algorithms, Proc. of IEEE ICC 2003

[33] Kouhei Fujimoto, Shingo Ata, and Masayuki Murata. Playout control for

streaming applications by statistical delay analysis, Proc. IEEE ICC, vol.8, pp 2337-

2342, JUN 2001.

[34] C.Hoene, I.Carreras, A.Wolisz, 2001, Voice over IP: Improving the Quality over

Wireless LAN by Adopting a Booster Mechanism – An Experiment Approach. Proc.

SPIE 2001 - Voice over IP (VoIP) Technology, pp. 157- Denver, Colorado, USA

[35] L.F.Sun, G.Wade, B.M.Lines and E.C.Ifeachor, 2001, Impact of Packet Loss

Location on Perceived Speech Quality ,Proceedings of 2nd IP-Telephony Workshop

(IPTEL '01), Columbia University, New York, pp.114-122.

[36] The Network Simulator - ns-2, http://www.isi.edu/nsnam/ns/

[37] ITU-T G.114, One-Way Transmission Time, FEB 1999

[38] Florian Hammer, Peter Reichl, Tomas Nordstrom, Gernot Kubin, Corrupted

Speech Data Considered Useful, in Proceeding First ISCA Tutorial and Research

Workshop on Auditory Quality of Systems, Mont Cenis, Germany, April 2003

[39] E. Uhlemann, T. M. Aulin, L. K. Rasmussen and P.-A.Wiberg, “Concatenated

hybrid ARQ - A flexible scheme for wireless real-time communication”, IEEE Real-

Time and Embedded Tech. and Appl. Symp., SEP 2002

[40] Christos Papadopoulos, Gurudatta M.Parulkar, Retransmission-Based Error

Control for Continuous Media Applications, Proc. NOSSDAV, 1996

[41] Guijin Wang, Qian Zhang, Wenwu Zhu and Ya-Qin Zhang, Channel-Adaptive

Error Control for Scalable Video over Wireless Channel, the 7th International

workshop on Mobile Multimedia Communcations (Momuc), Japan, Oct.2000



[42] Qingwen Liu, Shengli Zhou, and Georgios B. Giannakis, Cross-Layer

Combining of Adaptive Modulation and Coding with Truncated ARQ over Wireless

Links, IEEE Transactions On Wireless Communications, 2004 (To appear)

[43] Richard Han, David Messerschmitt, A Progressively Reliable Transport

Protocol For Interactive Wireless Multimedia, ACM Multimedia Systems Journal,

MAR 1999

[44] Stephen B.Wicker, Adaptive Rate Error Control Through the Use of Diversity

Combining and Majority-Logic Decoding in a Hybrid-ARQ Protocol, IEEE

Transactions on communications, VOL.39, NO.3, MAR 1991

[45] E. PAGE, Queuing system in OR, the Butterworths Group, 1972, ISBN

0408702370

[46] F.Cali, M.Conti and E.Gregori, “IEEE 802.11 wireless LAN: Capacity analysis

and protocol enhancement”, Proc. IEEE INFOCOM, 1998

[47] J. Rosenberg, L. Qiu and H. Schulzrinne, ‘Integrating Packet FEC into Adaptive

Voice Playout Buffer Algorithms on the Internet’, Proc. of IEEE Infocom 2000, vol.3

pp.1705-1714

[48] P. Brady, ‘A Technique for Inversting On-Off Patterns of Speech’, Bell System

Technical Journal, 44(1):1-22, JAN 1965.

[49] ITU-T Recommendation P.59, Telephone transmission quality objective

measuring apparatus: Artificial conversational speech.

[50] Shyan S.Chakraborty, Erkki Yli-Juuti, and Markku Liinaharja, An ARQ Scheme

with Packet Combining, IEEE Communications Letters, 1998

[51] E.Uhlemann., T.M. Aulin, L.K. Rasmussen and P-Arne Wiberg. Packet

Combining and Doping in Concatenated Hybrid ARQ Schemes Using Iterative

Decoding, Proc. of IEEE WCNC 2003

[52] Wenyu Jiang, Henning Schulzrinne, Comparison and Optimization of Packet

Loss Repair Methods on VoIP Perceived Quality under Bursty Loss, NOSSDAV 2002



APPENDICES

[APPENDIX A] ns-2 Extensions for ARQ Retry Limit Control

/* Modifications In mac-802_11.h */ class Mac802_11 : public Mac { public: Mac802_11(PHY_MIB* p, MAC_MIB *m); static int retr; … } // TCL Hooks for the simulator static class Mac802_11Class : public TclClass { public: Mac802_11Class() : TclClass("Mac/802_11") {} TclObject* create(int, const char*const*) { return (new Mac802_11(&PMIB, &MMIB)); } virtual void bind(); virtual int method(int argc, const char*const* argv); } class_mac802_11; /* Modifications in mac-802_11.cc */ void Mac802_11Class::bind() { //Call to base class bind() must precede add_method() TclClass::bind(); add_method("retrNo"); } int Mac802_11Class::method(int ac, const char*const* av) { Tcl& tcl = Tcl::instance(); int argc = ac - 2; const char*const* argv = av + 2; if (argc == 2) { if (strcmp(argv[1], "retrNo") == 0) { tcl.resultf("%d", Mac802_11::retr); return (TCL_OK); } } else if (argc == 3) { if (strcmp(argv[1], "retrNo") == 0) { Mac802_11::retr= atoi(argv[2]); //set value of the static variable here return (TCL_OK); } } return TclClass::method(ac, av); }



//Retransmission Routines void Mac802_11::RetransmitDATA() { struct hdr_cmn *ch; struct hdr_mac802_11 *mh; u_int32_t *rcount, *thresh; assert(mhBackoff_.busy() == 0); assert(pktTx_); assert(pktRTS_ == 0); ch = HDR_CMN(pktTx_); mh = HDR_MAC802_11(pktTx_); /* * Broadcast packets don't get ACKed and therefore * are never retransmitted. */ if((u_int32_t)ETHER_ADDR(mh->dh_da) == MAC_BROADCAST) { //Packet::free(pktTx_); pktTx_ = 0; /* * Backoff at end of TX. */ //rst_cw(); //mhBackoff_.start(cw_, is_idle()); //return;

// these lines are commented so ARQ mechanism can be //used for any topology

} macmib_->ACKFailureCount++; if((u_int32_t) ch->size() <= macmib_->RTSThreshold) { rcount = &ssrc_; thresh = &macmib_->ShortRetryLimit; } else { rcount = &slrc_; //thresh = &macmib_->LongRetryLimit; // set the value of retransmission limit *thresh=Mac802_11::retr;

printf("threshold=%d\n",*thresh); } (*rcount)++; if(*rcount > *thresh) { macmib_->FailedCount++; /* tell the callback the send operation failed before discarding the packet */ hdr_cmn *ch = HDR_CMN(pktTx_); if (ch->xmit_failure_) { ch->size() -= ETHER_HDR_LEN11; ch->xmit_reason_ = XMIT_REASON_ACK; ch->xmit_failure_(pktTx_->copy(), ch->xmit_failure_data_); }



discard(pktTx_, DROP_MAC_RETRY_COUNT_EXCEEDED); pktTx_ = 0; printf("(%d)DATA discarded: count exceeded\n",sta_seqno_); *rcount = 0; rst_cw(); } else { struct hdr_mac802_11 *dh; dh = HDR_MAC802_11(pktTx_); dh->dh_fc.fc_retry = 1; sendRTS(ETHER_ADDR(mh->dh_da)); //printf("(%d)retxing data:%x..sendRTS..\n",index_,pktTx_); inc_cw(); mhBackoff_.start(cw_, is_idle()); } }



[APPENDIX B] ns-2 Simulation Script for Per Packet Control of ARQ

# wireless2.tcl # simulation of a wired-cum-wireless scenario consisting of 2 wired nodes # connected to a wireless domain through a base-station node. #================================================================== # Define options #================================================================== set opt(chan) Channel/WirelessChannel ;# channel type set opt(prop) Propagation/TwoRayGround ;# radio-propagation model set opt(netif) Phy/WirelessPhy ;# network interface type set opt(mac) Mac/802_11 ;# MAC type set opt(ifq) Queue/DropTail/PriQueue ;# interface queue type set opt(ll) LL ;# link layer type set opt(ant) Antenna/OmniAntenna ;# antenna model set opt(ifqlen) 25000 ;# max packet in ifq set opt(nn) 1 ;# number of mobilenodes set opt(adhocRouting) DSDV ;# routing protocol set opt(x) 500 ;# x coordinate of topology set opt(y) 500 ;# y coordinate of topology set opt(seed) [lindex $argv 0] ;# seed for random number gen. set opt(stop) 20000 ;# time to stop simulation set opt(utp1-start) 2.0 set num_wired_nodes 2 set num_bs_nodes 1 # ================================================================ # check for boundary parameters and random seed if { $opt(x) == 0 || $opt(y) == 0 } { puts "No X-Y boundary values given for wireless topology\n" } if {$opt(seed) > 0} { puts "Seeding Random number generator with $opt(seed)\n" ns-random $opt(seed) } # create simulator instance set ns_ [new Simulator] set erate [lindex $argv 1] puts "erate $erate \n" proc UniformErr {} { global erate set em [new ErrorModel] $em set rate_ $erate $em unit pkt $em ranvar [new RandomVariable/Uniform] return $em } $ns_ node-config -IncomingErrProc UniformErr -OutgoingErrProc UniformErr



# set up for hierarchical routing $ns_ node-config -addressType hierarchical AddrParams set domain_num_ 2 ;# number of domains lappend cluster_num 2 1 ;# number of clusters in each domain AddrParams set cluster_num_ $cluster_num lappend eilastlevel 1 1 2 ;# number of nodes in each cluster AddrParams set nodes_num_ $eilastlevel ;# of each domain set tracefd [open wireless2.tr w] #set namtrace [open wireless2.nam w] $ns_ trace-all $tracefd #$ns_ namtrace-all-wireless $namtrace $opt(x) $opt(y) # Create topography object set topo [new Topography] #set mac80211 [new Mac/802_11] # define topology $topo load_flatgrid $opt(x) $opt(y) # create God create-god [expr $opt(nn) + $num_bs_nodes] #create wired nodes set temp {0.0.0 0.1.0} ;# hierarchical addresses for wired domain for {set i 0} {$i < $num_wired_nodes} {incr i} { set W($i) [$ns_ node [lindex $temp $i]] } # configure for base-station node $ns_ node-config -adhocRouting $opt(adhocRouting) \ -llType $opt(ll) \ -macType $opt(mac) \ -ifqType $opt(ifq) \ -ifqLen $opt(ifqlen) \ -antType $opt(ant) \ -propType $opt(prop) \ -phyType $opt(netif) \ -channelType $opt(chan) \ -macTrace OFF \ -wiredRouting ON \ -agentTrace ON \ -routerTrace OFF \ -topoInstance $topo #create base-station node set temp {1.0.0 1.0.1 1.0.2 1.0.3} ;# hier address to be used for wireless ;# domain set BS(0) [$ns_ node [lindex $temp 0]] $BS(0) random-motion 0 ;# disable random motion #provide some co-ord (fixed) to base station node $BS(0) set X_ 1.0 $BS(0) set Y_ 2.0 $BS(0) set Z_ 0.0 #configure for mobilenodes



$ns_ node-config -wiredRouting OFF for {set j 0} {$j < $opt(nn)} {incr j} { set node_($j) [ $ns_ node [lindex $temp \ [expr $j+1]] ] $node_($j) base-station [AddrParams addr2id \ [$BS(0) node-addr]] } #create links between wired and BS nodes $ns_ duplex-link $W(0) $W(1) 5Mb 2ms DropTail $ns_ duplex-link $W(1) $BS(0) 5Mb 2ms DropTail $ns_ duplex-link-op $W(0) $W(1) orient down $ns_ duplex-link-op $W(1) $BS(0) orient left-down # setup TCP connections set udp1 [new Agent/UDP] $udp1 set class_ 2 set null1 [new Agent/Null] set cbr1 [new Application/Traffic/CBR] $cbr1 set packetSize_ 32 $cbr1 set interval_ 0.020 $cbr1 attach-agent $udp1 $cbr1 set maxpkts_ 1 #per packet control $ns_ attach-agent $node_(0) $udp1 $ns_ attach-agent $BS(0) $null1 $ns_ connect $udp1 $null1 # Define initial node position in nam for {set i 0} {$i < $opt(nn)} {incr i} { # 20 defines the node size in nam, must adjust it according to your # scenario # The function must be called after mobility model is defined $ns_ initial_node_pos $node_($i) 5 } # begin to read in per packet information, i.e. Voiced or Unvoiced set pattern_file_name abmixed.vo set pattern_fid [open $pattern_file_name r] set cbrtime 0.0 set j -1 puts "Reading Speech Property Marking files.............." while {[eof $pattern_fid]==0} { incr j gets $pattern_fid current_line scan $current_line "%d" voice_flag set r($j) $voice_flag } set i 0 while {$i<=$j} { $ns_ at [expr $i*0.027] "Mac/802_11 retrNo 9;$cbr1 start" incr i } set opt(stop) [expr $i*0.02+10000]



$ns_ at $opt(stop) "$cbr1 stop" # Tell all nodes when the simulation ends for {set i } {$i < $opt(nn) } {incr i} { $ns_ at $opt(stop).00001 "$node_($i) reset"; } $ns_ at $opt(stop).00002 "$BS(0) reset"; $ns_ at $opt(stop).0002 "puts \"NS EXITING...\" ; $ns_ halt" $ns_ at $opt(stop).01 "stop" proc stop {} { global ns_ tracefd namtrace # $ns_ flush-trace close $tracefd close $namtrace #exec nam wireless2-out.nam & exit 0 } puts "Starting Simulation..." $ns_ run



[APPENDIX C] C Code for Majority-Logic Packet Combining

Packet combining techniques are used in the decoding process in packet

switched networks with ARQ protocol. The motivation to use packet combining is

that a received packet always contains at least a small amount of useful information.

This information can be used in conjunction with other received copies of the packet

to obtain an estimate of the transmitted data that is more reliable than that obtainable

from any single copy. There are two basic approaches to combine multiple received

packets: code combining and diversity combining.

Diversity combining differs from code combining in that multiple copies of a

packet encoded at rate R are combined bit by bit to create a single codeword from the

original rate R code. Each bit in the resulting packets make more reliable through the

receipt of multiple copies of each bit. Despite it is not as powerful as code combining;

diversity combining is much simpler to implement.

Majority-logic diversity combining is the use of multiple copies of each

transmitted bit in a voting scheme to obtain a single more reliable version of each bit.

Majority-logic packet combining rule

The majority-logic packet combining rule is the simplified majority-logic

decoding rule [44]. Let J be the number of received copies of a packet.

Let , be the set of bits with the same position i in packet copies of J. Let kiB , Jk ≤≤0

η be the number of bits with the value one in bits set . IfkiB , 12

+⎥⎦⎥

⎢⎣⎢≥

Jη , Bi in final

combined packet is determined to have a value of one. If ⎥⎦⎥

⎢⎣⎢ −

≤2

1Jη , Bi is

determined to have a value of zero. It should be noted that if J is even, η may equal to

2J , so 1

221

+⎥⎦⎥

⎢⎣⎢<<⎥⎦

⎥⎢⎣⎢ − JJ η . In this case, we can increase J to be odd through further

retransmission or to take 50 percent of risk if there is no time for further

retransmissions.



C code for majority-logic packet combining and producing bit errors in payload:

#ifdef HAVE_CONFIG_H #include <config.h> #endif #include <stdio.h> #include <stdlib.h> enum RXFrameType { RX_SPEECH_GOOD = 0, RX_SPEECH_PROBABLY_DEGRADED, RX_SPARE, RX_SPEECH_BAD, RX_SID_FIRST, RX_SID_UPDATE, RX_SID_BAD, RX_NO_DATA, RX_N_FRAMETYPES /* number of frame types */ }; enum TXFrameType { TX_SPEECH = 0, TX_SID_FIRST, TX_SID_UPDATE, TX_NO_DATA, TX_N_FRAMETYPES /* number of frame types */ }; typedef short Word16; #define SERIAL_SIZE 1+244+4+1 int main(int argc, char *argv[]) { FILE *file_serial, *lossfile,*losspattern; Word16 serial[SERIAL_SIZE],serial_noisy[6][SERIAL_SIZE]; int frame,iCombine,i,j,iseed,iMajority,erase_flag; float rm,errate; char buf[50]; if(argc<6) {printf("Usage: crpacket amr_encodedfile loss_pattern_file output_lossfile Error_rate randomseed\n"); exit(0);} if((file_serial=fopen(argv[1],"rb"))==NULL){ printf( "%s cannot be opened for read\n",argv[1]); exit(0);} if( (lossfile=fopen(argv[3],"wb")) ==NULL){ printf( "%s cannot be opened for write\n",argv[3] ); exit(0);} if( (losspattern=fopen(argv[2],"rb")) ==NULL){ printf( "%s cannot be opened for read\n",argv[2] ); exit(0);} //iCombine=atoi(argv[3]); errate=atof(argv[4]); iseed=atoi(argv[5]); frame=0; srand48(iseed); while (fread (serial, sizeof(Word16), SERIAL_SIZE, file_serial) == SERIAL_SIZE)



{ printf ("\nframe=%d ", ++frame); fgets(buf,50,losspattern); sscanf(buf,"%d %d",&erase_flag,&iCombine); if(iCombine<0 || iCombine >6) iCombine=0; if(erase_flag==0) serial[0]=TX_NO_DATA; else if(iCombine!=0) { //Multi-logical packet combining for(j=0;j<iCombine;j++){ for(i=0;i<SERIAL_SIZE;i++) serial_noisy[j][i]=serial[i]; for(i=1;i<SERIAL_SIZE-5;i++) {rm=drand48(); if(rm<=errate) serial_noisy[j][i]=!serial_noisy[j][i]; } } //corrupt original packet for(i=1;i<SERIAL_SIZE-5;i++) {rm=drand48();//Benoulli random error

if(rm<=errate) serial[i]=!serial[i]; }

//Multi-logical packet combining for(i=1;i<SERIAL_SIZE-5;i++){ iMajority=1; for(j=0;j<iCombine;j++) if(serial[i]==serial_noisy[j][i]) iMajority++;

if(iMajority<(iCombine/2+iCombine%2)) {serial[i]=!serial[i];printf("combined ");}

} } if (fwrite (serial, sizeof (Word16), SERIAL_SIZE, lossfile) != SERIAL_SIZE) { fprintf(stderr, "\nerror writing output file: %s\n", argv[2]); }; } fflush(lossfile); fclose(file_serial); fclose(lossfile); fclose(losspattern); return EXIT_SUCCESS; }



[APPENDIX D] List of Items Included in the Appended CD The following items are included in the appended CD:

Thesis

The e-copy of the thesis (Word/PDF)

Papers

Papers published or going to be published (Word/PDF)

References

Papers/Documents referenced in the thesis

Presentation

Slides presented in the MRes Viva

Software

Developed programs for the project, including matlab/C ++ source codes and

sripts. And related software tools (e.g. AMR codec and PESQ), data (e.g. ITU-

T speech file ).



[APPENDIX E] Published Papers

[1] Z.Li, L.Sun, Z.Qiao and E.Ifeachor, Perceived Speech Quality Driven

Retransmission Mechanism for Wireless VoIP, Proc. IEE 3G 2003 pp395-399, London,

UK, JUN 2003


PERCEIVED SPEECH QUALITY DRIVEN RETRANSMISSION MECHANISM FOR WIRELESS VoIP

Z Li, L Sun, Z Qiao and E Ifeachor Department of Communication and Electronic Engineering

University of Plymouth, Plymouth, U.K.

Abstract—Effective link Layer retransmission mechanisms in wireless networks are important as they can reduce packet loss due to bit errors. For wireless voice over IP (VoIP) , a key question that needs to be addressed in order to provide the best possible perceived speech quality is how to utilize retransmission schemes to recover corrupted packets whilst avoiding excessive retransmission delays. The contributions of this paper are two fold. First, we use an objective measure of perceived conversational speech quality (MOSc) as a metric to evaluate the performance of three current retransmission schemes (i.e. No Retransmission, Speech Property-Based Retransmission and Full Retransmission), while considering the impact of retransmission jitters. Our findings indicate that the performance of the retransmission mechanisms is a function of both wireless link quality and delay introduced in the wireline network. Second, we propose a new perceived speech quality driven retransmission mechanism which may be used to achieve optimum perceived speech quality for wireless VoIP (in terms of the objective mean opinion score) by switching to the most suitable retransmission schemes under different communication conditions. I.INTRODUCTION

Quality of Service (QoS) support for voice over IP (VoIP) in wireless/mobile networks is an important issue for technical and commercial reasons. However, speech quality for VoIP suffers from high packet loss rates and other impairments in the wireless link. Retransmission mechanisms, such as automatic repeat request (ARQ), have been incorporated in wireless and cellular networks to retransmit lost packets to improve performance in data transmission over wireless. In wireless networks such as 802.11b [1], the retransmission mechanism is a simple Stop & Wait algorithm and is implemented at the Media Access (MAC) layer, in which each transmitted packet must be acknowledged before the next packet can be sent. If in a certain timeout period an acknowledgement is not received by the sender of a frame, the sender will retransmit the frame until a maximal retransmission limit is reached. When the wireless link quality is poor, retransmission of MAC frames can effectively recover corrupted packets that contain bit errors.

However, excessive delays may be introduced by retransmission schemes that have significant adverse effects on real-time applications such as VoIP, which are sensitive to delay. A simplex retransmission scheme always negatively affects perceived speech quality in VoIP. There exists a tradeoff between packet loss and delay in a

variety of retransmission schemes. Improved retransmission mechanisms such as Hybrid loss recovery scheme [2] and Speech Property-Based ARQ (SPB-ARQ) [3] have been proposed to reduce speech distortions by protecting packets that are perceptually more relevant. However, these schemes are only limited to listening-only quality assessment of the effect of the retransmission schemes on speech quality and do not consider the impact of delay which is important for conversation and interactivity. Further, these schemes do not consider the impact of retransmission jitters. Since adaptive jitter buffers would discard inappropriately retransmitted packets, the character of retransmission jitters introduced by different retransmission schemes should be considered.

The primary aim of the study reported in the paper is to investigate new retransmission mechanisms to improve speech quality for wireless VoIP. The contributions of the paper are twofold. First, we propose the use of a perceived conversational speech quality assessment method [4] to evaluate the performance of current retransmission mechanisms (No retransmission, Full retransmission, SPB retransmission) instead of listening-only method or individual network parameters (e.g. packet loss and delay). Second, we present a new retransmission policy, which can adapt to the most suitable retransmission mechanism, depending on the wireless link quality and network delay conditions. The ultimate aim of this perceived speech quality driven policy is to achieve optimum speech quality (in terms of the conversational Mean Opinion Score MOSc) in the face of network impairment factors and wireless channel situations, while considering the coupling effect of retransmission jitters and adaptive jitter buffers.

The paper is organized as follows, In Section II we describe the basic issues and methodology, including retransmission mechanisms, conversational speech quality evaluation and adaptive jitter buffers. Section III describes our simulation system. Results of simulations and the proposed perceived speech driven retransmission scheme is presented in Section IV. Section V concludes this paper.

II.BASIC ISSUES AND METHODOLOGY

A. Speech Property-based Retransmission Mechanisms Speech Property-Based QoS control schemes are

based on the fact that some voice frames are perceptually more important than others when encoded speech is transferred through packet networks. Recent experimental results show [5], that in some popular codecs used in wireless applications (e.g. AMR) the position of a frame loss has a significant influence on the perceived speech

quality. In such codecs, frame loss concealment techniques are used to interpolate the parameters for the loss frames from the parameters of the previous frames. Lost voice frames at the beginning of a talkspurt will be concealed using the decoding information of previous unvoiced frames. However, because voiced sounds always have a higher energy than unvoiced sounds, concealment of these frames with unvoiced frames that have lower energy will cause a serious degradation in speech quality. Moreover, at the unvoiced/voiced transition stage, it is difficult for the decoder to correctly conceal the loss of voiced frames using the filter coefficients and the excitation for an unvoiced sound, especially when burst loss occurs or the frame size grows.

To maximise the perceptual quality at the receiving end, perceptually important voice packets may be protected by giving them a high priory with the unimportant packets handled as 'best-effort' . For SPB retransmission, a retransmission scheme that protects only the perceptual important speech frames, is presented in [2][3]. Experimental results reported in [2] show that SPB retransmission could provides a better speech quality (assessed by EMBSD) than No retransmission scheme, which do not retransmit any packet. In [3], SPB retransmission was shown to be more efficient in reducing retransmission delays than Full retransmission, which retransmits every unacknowledged (unACKed) packet. B. MEASURING CONVERSATIONAL SPEECH QUALITY

In previous studies [2][3], the assessment of retransmission schemes was performed using the EMBSD algorithm, which only considers the distortion caused by packet loss. However, in practice both packet loss and delay are crucial in voice conversation and long retransmission delays (e.g. due to long network delay) would seriously impact speech quality . The E-model [6] is introduced by ITU as a non-intrusive quality assessment method to obtain a measure of voice quality. Unfortunately, the E-model is only applicable to a limited number of codecs which at present does not include the AMR codec. In our simulation, we employed a technique that combines the PESQ and the E-model to evaluate the performance of different retransmission schemes. In the combined approach , the ITU PESQ is firstly used to quantify the impact of packet loss on speech quality. The result of this is then converted to the equipment impairment Ie. The average end-to-end delay effect, Id, is then calculated. The E-model is then used to obtain a measure of the speech quality, MOSc, based on Ie and Id (see Figure 1). Details of the implementation of the combined method are given in [4] C. Adaptive jitter buffer and Retransmission Jitters

In VoIP applications, jitters are compensated for in the receiver by a jitter buffer. The size of a jitter buffer can be fixed or adjustable. Fixed jitter buffers cannot adapt

readily to changes in network delays and as a result are not practical in real VoIP applications. In our study, we investigated fast-exp, one of the classical adaptive jitter buffer algorithms proposed in [7]. By using a smaller weighting factor as delays increase, the fast-exp algorithm can quickly adapt to the increases while avoiding discarding of too many packets. It estimates the current

mean network delay (denoted as d ) and current variance of network delay (denoted as v ) when a packet arrives. The mean delay estimation equation is given by:

^

i

i

^

≤−+

>−+

−−

−−

^

11

^

11

^

:)1(

:)1(

iiii

iiii

dnnada

dnnd ββ

where is the network delay of the iin th packet, 75.0=β and =a

^

0.99802. The following equation is used to

estimate : iv

iiii ndava −−+= −

^

1

^)1(

^

idD µ+=

v . At the beginning

of a talkspurt, adaptive jitter buffer changes the play out delay using the equation: , where D is the play out delay and

^* v i

µ is a constant that can be selected from 1 to 20. We set µ to be 4 in our simulation. It should be noted that for VoIP over wireless, the network delay consists of delays introduced by the wireline network and the wireless link. Jitters can be introduced by network congestions in the wireline network or by retransmissions/propagations in the wireless links. In view of the fact that most jitter buffer algorithms were proposed for compensation of network congestion jitters, it should be valuable to investigate the impact of retransmission jitters for VoIP over wireless

in

III. SIMULATION SYSTEM DESCRIPTION Our study is based on network simulator ns-2 [8], in

which we simulated a last-hop wireless scenario. Both of the IEEE 802.11 and the Ethernet protocol stack are implemented in the simulator. A two way Bernoulli error model was inserted to simulate the wireless link transmission errors. In 802.11, if the packet size exceeds the Max. Transmission Unit (e.g. 1500 bytes for WaveLan) the packet will be fragmented. Since we set the packet size to 71 bytes, a 12.2kbit rate AMR speech frame for one RTP packet the impact of fragmentation is avoided.

The simulation system is given in Figure 1. In our simulation, the original speech file is first encoded by the AMR codec and then analyzed to extract the speech marking information (voiced/unvoiced) for each packet. The speech marking information is used with network delay and wireless link quality to control the retransmission policy. The error model determines whether a packet is corrupted or not according to

Fixed HostMobile Host

Original Speech

AMR Encoder

RTP Adaptive Jitter Buffer

AMR DecoderRTP

UDPUDPSpeech

Marking IPNetwork Delay IP

EthernetMACRetx.

Limit Control

DegradedSpeech

PER PHY BS

End-to-end MOS/Ie Delay (Id) MOSc

Speech Quality Evaluation

PESQ

EModel

Figure 1 Simulation Environment

packet error probability ( PER). The base station (BS) will neither send an ACK to the sender for a corrupted packet nor present it to the high layer. If the MAC layer of the sender has not received an acknowledgement for a packet, it will retransmit the packet until the packet is ACKed or it reaches the limit of retransmission (we will denote Retransmission as Retx in the rest of this paper). In our simulation, we set the Retx limit to 6 for both SPB Retx and Full Retx. In the receiver, the received speech packets are fed to an adaptive jitter buffer and subsequently decoded to recover the degraded speech file that is used to obtain a measure of speech quality.

In our study, we used combined PESQ and E-Model to evaluate the conversational speech quality as described in Section II-B. Performance index was obtained averaging the computation results that were obtained from this method for each 20 seconds of the speech file.

IV. RESULT ANALYSIS AND THE PROPOSED RETRANSMISSION SCHEME

The following simulation results were obtained by

averaging results of 50 simulations with different random seeds to avoid the impact of packet loss locations. The three simulated retransmission schemes are SPB Retx, Full Retx and Null Retx.

TABLE.1 gives the average number of voiced packets losses of transmitting 73000 speech packets in our simulated wireless network with these schemes. For simplicity, we only simulated the wireless link for the purpose of this study. And only the wireless link (Retx limit exceeded) and the adaptive jitter buffer account for the packet losses. In Table.1, most of the losses of voiced packets in Full Retx or SPB Retx are caused by jitter buffer. As we deployed a Bernoulli error model in our simulation,

most of the retransmitted packets can be successfully received by the receiver. If the bursty of packet errors is considered, there should be more losses of voiced packets in Full Retx or SPB Retx scheme.

TABLE.1- Average Voiced Packets Losses With fast-exp Jitter Buffer Retx SchemePER

No Retx

SPB Retx

Full Retx

0.0001 15 53 290.0005 36 54 270.0008 61 51 260.001 69 47 220.003 144 28 170.005 241 22 130.01 474 13 90.05 2344 42 160.10 4678 931 159

It seems very straightforward that SPB Retx should be

better than No Retx and at least the same as Full Retx with regard to the performance of protecting voiced frames. However, in TABLE.1, we can see that Full Retx always has less voiced packets losses, while No Retx has the least lost voiced packets when link quality is good (packet error probability lower than 0.0005). In fact, as in fast-exp algorithm, the estimated playout delay will increase with the number of retransmission jitters increases. When link quality is good, the estimated play out delay keeps at a low level, occasionally retransmitted packets and packets adjacent to them would be discarded by jitter buffer due to jitters they introduced. However, in No Retx scheme, a corrupted packet doesn’t affect its following packets. That’s why it has least packet losses when link quality is very good. On the other hand, in SPB Retx, unvoiced

Buf

fere

d R

etx

Del

ay (m

s)

10-4 10-3 10-2 10-1 10010-2

10-1

100

101

102


Loss

Rat

e (%

)No RetxSPB RetxFull Retx

10-4 10-3 10-2 10-1 1000

50

100

150

200

250

300



Figure 2 Overall packet loss rate comparison

100 120 140 160 180 200 220 240 260 280 3003.2

3.3

3.4

3.5

3.6

3.7

3.8

3.9

4

4.1

4.2

Network Delay

MO

Sc


Figure 3 Buffered retx delay comparison

10-4 10-3 10-2 10-1 1001.5

2

2.5

3

3.5

4

MO

Sc


Figure 5 MOSc comparison with packet error probability 0.001

packedelaylink unACadaptnext more

rates Figurrate ain Fretrannot tloss qualiin wobuffewhento 0.becaucorru


Figure 4 MOSc comparison with 175ms network
ts are not retransmitted hence the estimated playout can’t reflect current wireless link situations when quality becomes worse. While in Full Retx, every Ked packets is retransmitted, this is helpful for the ive jitter buffer to estimate the playout delay for the talkspurt. That’s why the adaptive jitter buffer discard packets in SPB Retx than in Full Retx. Figure 2 and Figure 3 give the overall packet loss and buffered retransmission delay comparison. In
e 2, we can see that Full Retx keeps the packet loss t a low level at the expense of higher delay as plotted igure 3 because every unACKed packet is smitted. It’s very interesting that when link quality is

oo bad (packet error probability up to 0.01), packet rate of Full Retx scheme is decreasing while link ty becoming worse. In fact, as we mentioned before, rse link quality, more retransmissions helps the jitter r to estimate playout delay more accurately. However, link quality is very good (packet error probability up 0005), No Retx can obtain the best packet loss rate se it doesn’t introduce any jitter and few packets is pted due to bit errors. As a compromised method, the

packet loss rate and Retx delay of SPB Retx is between No Retx and Full Retx.

Using the evaluation method described in Section II-B, we give a more straightforward performance comparison in Figure 4 and Figure 5 for these schemes with MOSc as the metric. Our evaluation didn’t consider the packet losses introduced in the wireline network hence to focus on the performance of Retx schemes. However, we considered network delay in the evaluation. For natural hearing, delays lower than 100ms cannot really be appreciated, but delays above 150ms can obviously affect conversation interactivity [8]. Considering Retx delays rarely exceed 100ms, to obviously reflect the impact of Retx delay, we assume 175ms delay had been introduced in the wireline network and add it to the end-to-end delay in the MOSc evaluation. In Figure 4, the MOSc of Full Retx is lower than No Retx and SPB Retx when packet error probability is lower than 0.003. That’s because Full Retx scheme always introduces more Retx delay, while the perceived speech quality is sensitive to high delay when link quality is good. When packet error probability exceeds 0.003, Full Retx scheme becomes the best, as it can greatly reduce the number of corrupted packets. Fig. 5 illustrates the

performance comparison with different network delays when packet error probability is 0.001. In Fig. 5, we can see that when delay lower than 150ms, Full Retx can get the best MOSc. When delay is higher than 150ms Null Retx becomes the best, it confirms that 150ms is the threshold above which delay begins to have a severe impact on speech quality. Similar to Fig 4, the performance of SPB is between No Retx and Full Retx, but it doesn’t become the best in both sides of the delay threshold.

Considering both No Retx and Full Retx schemes can achieve the best MOSc under different link quality and network delay situations. We propose a new perceived speech quality driven retransmission scheme, which can switch between these two schemes when link quality and network delay changes. The pseudo code of the new scheme is shown in Figure 6. Low_Error_Threshold is set to be 0.0005 and High_Error_Threshold is 0.003. Since according the simulation results, when packet error probability is lower than 0.0005, No Retx can achieve the best MOSc even delay is not considered, whereas Full Retx becomes the best when packet error probability exceed 0.003, even network delay is very high. When packet error probability is between 0.0005 and 0.003, the decision should be made according to network delay. In the proposed scheme, Delay_Threshold is set to be 150ms as it’s the threshold that delay begin to obviously affect speech quality. In real applications, we can convert Bit Error Rate (BER) to PER, and BER can be obtained according to bit errors in bit pattern series sent from BS. Network delay can be estimated by deducting average MH to BS handoff delay from average end-to-end delay that can be retrieved from RTP packet header.

The performance of the new perceived speech driven scheme is also given in Figure 4 and Figure 5 under different network delay and packet error probability. We can see that the curve of the perceived quality driven scheme is overlapped with parts of No Retx and Full Retx when they achieve best MOSc. As it can switch to the more suitable scheme between No Retx and Full Retx when communication conditions changes. Since this method only uses Full Retx when it’s necessary, it can also achieve the similar retransmission efficiency as SPB Retx while avoid the implementation complexity to obtain speech property information that is necessary for SPB Retx.

VII. CONCLUSION A suitable retransmission scheme is crucial for

obtaining the best possible perceived speech quality in wireless VoIP applications. In this paper, we investigated the performance of three different retransmission schemes (No Retx, SPB Retx, Full Retx) with regard to the perceived conversational speech quality. The impact of retransmission jitters with an adaptive jitter buffer was also considered. The simulation results show that the performance of these schemes depends on the network delay and wireless link quality. Considering that the wireless environment is variable, we have proposed a perceived speech quality driven retransmission scheme that can adapt to the wireless link quality and network delay conditions. As the SPB Retx is not involved in the new method, the implementation complexity for retrieving speech property information is avoided. Our results show that the proposed method can achieve an optimum MOSc compared to No Retx, Full Retx and SPB Retx. Since the most suitable scheme is deployed by the new method when communication conditions changes. In the study, a simplified last hop wireless network is implemented to demonstrate wireless voice over IP scenario. Further improvements may be achieved by making the simulation closer to real network, e.g. by incorporating a multi-state error model in the wireless link.

Reference: [1] IEEE Standards Department, 1999, IEEE 802.11 Standard for Wireless LAN, Medium Access Control (MAC) and Physical Layer (PHY) Specification. [2] C.Hoene, I.Carreras, A.Wolisz, 2001, Voice over IP: Improving the Quality Over Wireless LAN by Adopting a Booster Mechanism – An Experiment Approach. Proc. SPIE 2001 - Voice Over IP (VoIP) Technology, pp. 157- Denver, Colorado, USA [3] H Sanneck, N Tuong L Le et al, 2001, Selective Packet Prioritization for Wireless Voice over IP, 4th Int Sym Wireless Personal Multimedia Communication, Denmark [4] L Sun, E.C.Ifeachor, 2003, Prediction of Perceived Conversational Speech Quality and Effects of Playout Buffer Algorithms, to appear in the Proc. of IEEE ICC 2003 [5] L.F.Sun, G.Wade, B.M.Lines and E.C.Ifeachor, 2001, Impact of Packet Loss Location on Perceived Speech Quality ,Proceedings of 2nd IP-Telephony Workshop (IPTEL '01), Columbia University, New York, pp.114-122.

[6] ITU-T G.107, The E-model, a computational model for use in transmission planning, May 2000 [7] R.Ramachandran, J.Kurose, D.Towsley and H.Schulzrinne, 1994, Adaptive playout mechanisms for packetized audio applications in wide-area networks, Proc. of IEEE Inforcom, vol.2, pp.680-688 [8] The Network Simulator - ns-2, available on line at http://www.isi.edu/nsnam/ns/

if (PER < Low_Error_Threshold) . No_Retx();

else if (PER>High_Error_Threshold) Full_Retx();

else { if(Network_Delay<Delay_Threshold)

Full_Retx(); else No_Retx();

}

[9] ITU-T G.114, One-Way Transmission Time, Feb 1999e
Figure 6 Perceived speech quality driven Retx scheme pseudo cod
http://www.isi.edu/nsnam/ns/