project sampal

8/21/2019 Project Sampal

1/99

Video Conferencing System

with

Multimedia Capabilities

Janet Adams

April 2005

BACHOLOR OF ENGINEERING

IN

TELECOMMUNICATIONS ENGINEERING

Supervised by Dr. Derek Molloy


2/99

Video Conferencing System Janet Adams

ii

Acknowledgements

I would like to thank Dr. Derek Molloy, who supervised me on this project, for his

enthusiasm and guidance. I would also like to thank Edward Casey, whom I

collaborated with on certain areas of the project, and his supervisor, Dr. Gabriel

Muntean, for his support and advice. My thanks also go to my friends Edward Casey,Edel Harrington and Hector Climent for listening to me and guiding me through my

initial presentation. I would like to dedicate this project to my parents, who have

supported me throughout all my time in college and especially during this, my final

year.

DeclarationI hereby declare that, except where otherwise indicated, this document is entirely my

own work and has not been submitted in whole or in part to any other university.

Signed: ...................................................................... Date: ......................................


3/99


iii

Abstract

This document will describe the development of a video conferencing system with

multimedia capabilities. The concept of multicasting will be explored as this was used

in the development of the video conferencing. Other concepts, which were used in the

development of the system such as Java Media Framework, Real-time TransportProtocol and a number of encoding schemes, will also be investigated.

The design of the system will explain how each of the features was planned for and

developed, and will provide the user with an understanding of video conferencing,

client server communications, motion detection and much more. The implementation

section is read like a user manual. On completion of this section, the reader should be

able to make full use of all of the features within the application and should

understand the depth to which each of the features can be used.

When this document has been read, the reader will fully understand both how the

system was developed and how it can be used, as well as understanding the necessary

technical information to understand how the different features work.


4/99


iv

Table of Contents

ACKNOWLEDGEMENTS.........................................................................................II

DECLARATION..........................................................................................................II

ABSTRACT ................................................................................................................III

TABLE OF CONTENTS........................................................................................... IV

TABLE OF FIGURES............................................................................................ VIII

TABLE OF TABLES...................................................................................................X

1 INTRODUCTION.................................................................................................1

1.1 AIM OF THIS PROJECT...................................................................................1

1.2 CURRENT EXAMPLES OF SIMILAR APPLICATIONS ........................................1

1.3 EQUIPMENT AND SOFTWARE ........................................................................2

1.3.1 JBuilder 2005..........................................................................................2

1.3.2 Logitech Webcam....................................................................................2

1.3.3 Laptop .....................................................................................................2

1.3.4 Digital Camcorder ..................................................................................2

2 TECHNICAL BACKGROUND ..........................................................................3

2.1 JAVA MEDIA FRAMEWORK...........................................................................3

2.1.1 Introduction.............................................................................................3

2.1.2 JMF Architecture ....................................................................................5

2.1.3 Principle Elements ..................................................................................6

2.1.4 Common Media Formats.........................................................................9

2.1.5 Real Time Transport Protocol (RTP) Architecture in JMF..................11

2.1.6 Alternatives to JMF...............................................................................15

2.1.7 Summary................................................................................................15

2.2 REAL-TIME TRANSPORT PROTOCOL...........................................................16

2.2.1 Introduction...........................................................................................16

2.2.2 Some RTP Definitions ...........................................................................17

2.2.3 RTP Data Structures .............................................................................19

2.2.4 RTP Control Protocol ...........................................................................21


5/99


v

2.2.5 Alternatives to RTP ...............................................................................25

2.2.6 Summary................................................................................................25

2.3 AUDIO ENCODING SCHEME G.723.1........................................................26

2.3.1 Introduction...........................................................................................26

2.3.2 Encoder Principles................................................................................26

2.3.3 Decoder Principles................................................................................27

2.3.4 Alternative Audio Encoding Schemes ...................................................28

2.3.5 Summary................................................................................................28

2.4 VIDEO ENCODING SCHEME H.263...........................................................28

2.4.1 Introduction...........................................................................................28

2.4.2 Summary of Operation ..........................................................................28

2.4.3 Alternative Video Encoding Schemes....................................................29

2.4.4 Summary................................................................................................292.5 IMAGE OBSERVATION ................................................................................29

2.5.1 Initial Ideas ...........................................................................................29

2.5.2 The Way it Works ..................................................................................30

2.6 MULTICASTING ..........................................................................................32

2.6.1 Alternatives to Multicasting..................................................................32

2.6.2 What is Multicasting .............................................................................33

2.7 SUMMARY ..................................................................................................34

3 DESIGN OF THE SYSTEM ..............................................................................35

3.1 SYSTEM ARCHITECTURE ............................................................................35

3.1.1 Client to Server Communication...........................................................35

3.1.2 Client to Client Communication............................................................37

3.2 SYSTEM DESIGN .........................................................................................37

3.2.1 The Server .............................................................................................38

3.2.2 The Client ..............................................................................................40

3.3 MESSAGING STRUCTURE............................................................................41

3.4 CONFERENCING..........................................................................................42

3.5 IMAGE OBSERVATION ................................................................................46

3.6 COMMON PROCEDURES WITHIN THE APPLICATION ....................................47

3.6.1 Login .....................................................................................................47

3.6.2 Call Setup ..............................................................................................48

3.6.3 Call Teardown.......................................................................................49


6/99


vi

3.6.4 Logout ...................................................................................................50

3.7 OTHER FEATURES WITHIN THE APPLICATION.............................................50

4 IMPLEMENTATION OF THE SYSTEM .......................................................51

4.1 INTRODUCTION...........................................................................................51

4.2 LOGGING IN ...............................................................................................51

4.3 CALLS ........................................................................................................52

4.3.1 Making a Peer to Peer Call ..................................................................52

4.3.2 Receiving a Person to Person Call .......................................................55

4.3.3 Initiating a Conference Call..................................................................56

4.3.4 Joining a Conference Call ....................................................................57

4.4 MESSAGES..................................................................................................58

4.4.1 Sending an MMS Message ....................................................................58

4.4.2 Receiving an MMS Message .................................................................61

4.4.3 Videomail Messages..............................................................................63

4.5 EXTRA FEATURES ......................................................................................64

4.5.1 Image Observation................................................................................64

4.5.2 Adaption ................................................................................................64

4.6 USING THE SERVER ....................................................................................64

5 RESULTS AND DISCUSSION .........................................................................69

6 CONCLUSIONS AND FURTHER RESEARCH............................................756.1 THE BENEFITS OF THIS PROJECT.................................................................75

6.2 THE IMPACT OF THIS PROJECT....................................................................75

6.3 FUTURE RESEARCH POSSIBILITIES .............................................................76

6.4 MEETING THE REQUIREMENTS ...................................................................77

REFERENCES............................................................................................................78

7 APPENDIX 1 .......................................................................................................79

7.1 CALL SETUP REQUEST ...............................................................................79

7.2 LOGIN REQUEST .........................................................................................80

7.3 LOGOFF REQUEST ......................................................................................81

7.4 CALL END REQUEST...................................................................................82

7.5 CONFERENCE SETUP REQUEST ...................................................................83

7.6 ADD PARTICIPANT TO CONFERENCE REQUEST...........................................84


7/99


vii

7.7 END CONFERENCE REQUEST ......................................................................85

7.8 SEND MESSAGE REQUEST ...........................................................................87

7.9 RECEIVE MESSAGE REQUEST .....................................................................87

8 APPENDIX 2 .......................................................................................................89

8.1 IMAGE OBSERVATION CODE ......................................................................89


8/99


viii

Table of Figures

FIGURE 2.1-MEDIA PROCESSING MODEL.......................................................................4

FIGURE 2.2-SYSTEM PROCESSING MODEL.....................................................................5

FIGURE

2.3-JMF

B

ASICS

YSTEMM

ODEL.......................................................................6

FIGURE 2.4-RTPAND THE OSIMODEL .......................................................................17

FIGURE 2.5RTPPACKET HEADER FORMAT ...............................................................20

FIGURE 2.6-RTCPSENDER REPORT STRUCTURE ........................................................24

FIGURE 2.7-RTCPRECEIVER REPORT STRUCTURE .....................................................25

FIGURE 2.8-G.723.1ENCODER ....................................................................................27

FIGURE 2.9-G723.1DECODER .....................................................................................27

FIGURE 2.10-H.263BASELINE ENCODER ....................................................................29

FIGURE 2.11-MACROBLOCKS WITHIN H.263 ...............................................................31

FIGURE 2.12-MOTION PREDICTION..............................................................................31

FIGURE 2.13-ORIGINAL CONFERENCING PLAN ............................................................33

FIGURE 2.14-MULTICASTING THROUGH ROUTER ........................................................34

FIGURE 3.1-CLIENT TO SERVER COMMUNICATION ......................................................35

FIGURE 3.2CLIENT TO CLIENT COMMUNICATION ......................................................37

FIGURE 3.3-SERVER CLASS DIAGRAM.........................................................................38

FIGURE 3.4-EXAMPLE OF PUSH PULL MESSAGE SETUP ...............................................39

FIGURE 3.5-CLIENT CLASS DIAGRAM..........................................................................41

FIGURE 3.6-ALLOCATING A CONFERENCE POSITION ...................................................43

FIGURE 3.7-MESSAGE SEQUENCE CHART FOR CONFERENCE CALL .............................44

FIGURE 3.8-CONFERENCING SETUP .............................................................................45FIGURE 3.9-IMAGE OBSERVATION AVERAGES.............................................................46

FIGURE 3.10-MESSAGESEQUENCE CHART FOR LOGIN ................................................47

FIGURE 3.11-MESSAGE SEQUENCE CHART FOR CALL SETUP ......................................48

FIGURE 3.12-MESSAGE SEQUENCE CHART FOR CALL TEARDOWN..............................49

FIGURE 3.13-MESSAGE SEQUENCE CHART FOR LOGOUT.............................................50

FIGURE 4.1-LOGIN SCREEN .........................................................................................52

FIGURE 4.2-HOME SCREEN ..........................................................................................53

FIGURE 4.3-MAKING A P2PCALL ...............................................................................54

FIGURE 4.4-DURING A CALL........................................................................................55

FIGURE 4.5-CALL ACCEPT/REJECT .............................................................................56

FIGURE 4.6-INITIATING A CONFERENCE CALL .............................................................57

FIGURE 4.7-CONFERENCE REQUEST ............................................................................58

FIGURE 4.8-MMSSCREEN ..........................................................................................59

FIGURE 4.9-ATTACH BUTTON FILE CHOOSER .............................................................60

FIGURE 4.10-MMSSCREEN READY TO SEND..............................................................61

FIGURE 4.11UNIFIED INBOX SCREEN.........................................................................62


9/99


ix

FIGURE 4.12-MESSAGE POPUP WINDOW .....................................................................62

FIGURE 4.13-LEAVE VIDEOMAIL REQUEST .................................................................63

FIGURE 4.14-VIDEOMAIL COMPOSE ............................................................................63

FIGURE 4.15VIDEOMAIL POPUP.................................................................................64

FIGURE 4.16-SERVER LOGIN SCREEN ..........................................................................65

FIGURE 4.17-SERVER ACTIVITY SCREEN .....................................................................66FIGURE 4.18-SERVER CLIENT STATUS SCREEN ...........................................................66

FIGURE 4.19-SERVER CLIENT STATUS SCREEN WITH CLIENTS ....................................67

FIGURE 4.20-SERVER ADMINISTRATION SCREEN ........................................................68


10/99


x

Table of Tables

TABLE 2.1JMFCOMMON VIDEO FORMATS ...............................................................10

TABLE 2.2JMFCOMMON AUDIO FORMATS...............................................................11

TABLE 5.1-TESTING SCENARIOS:LOGIN/LOGOFF ......................................................70

TABLE 5.2-TESTING SCENARIONS:MAKING A CALL ...................................................71

TABLE 5.3-TESTING SCENARIOS:SENDING A MESSAGE ..............................................72

TABLE 5.4-TESTING SCENARIOS:CONFERENCE CALL .................................................73

TABLE 5.5-OTHER TESTING SCENARIOS......................................................................74


11/99


1

Chapter 1

1

Introduction

All business organisations, for example, office blocks, colleges and shopping centres,

have telephone systems installed in them. These telephone systems allow features such

as call forward, call divert, voicemail, free extension dialling to other users within in

the same network, etc. Another object that is found in almost all of these facilities is

computers, usually one per user. Therefore, in the majority of establishments, you will

find that every employee has a telephone handset and a computer. A cost effective and

space saving idea would be to combine these two everyday utilities so that the

computer can also be used as a phone. People want their lives and work to be as

simple and time efficient as possible and one way to achieve this is to have a software

based telephony system on their computers. Why do they need a physical telephone

handset when it is possible to attain all the same features on their computers, cutting

out the expense of the handset?

1.1 Aim of this Project

The aim of this project is to develop a video conferencing facility with multicasting

capability and MMS functionality. The application will be developed in Java making

use of the Java Media Framework for real time applications. The project will be

developed in conjunction with Edward Casey, who will develop Videomail and

adaption features to add to the system.

1.2 Current Examples of Similar Applications

There are some examples of software based phone systems available. One example is

Skype, an internet phone system. This allows users to have voice conversations, free

of charge, over the internet, provided that the party they are calling is also using the

Skype service. The disadvantage is that a company employing this system would have

no control over their users. Another example is Vonage, which offers the same sort of

service as Skype and hence the same disadvantages.


12/99


2

1.3 Equipment and Software

1.3.1 JBuilder 2005

This is the program that was used to code and compile all of the Java code. The reason

that this program was chosen is that it was available for free and it was very

straightforward to use. It was simple to use but it did the job. One of the features that

was very helpful in this program was that it highlighted any common coding errors,

which saved a lot of time. In other situations, the developer may not have been

informed of these errors until after compilation.

1.3.2 Logitech Webcam

This was used for the testing of the video calls.

1.3.3

LaptopTesting was difficult as very few of the features could be tested alone. Almost all

testing required two computers. For this reason, it was most efficient to use two

laptops connected to two webcams.

1.3.4 Digital Camcorder

The digital camcorder was used for the development of the image observation, as the

low quality of the webcam introduced to the image, which hampered the calculation of

an adequate threshold value.


13/99


3

Chapter 2

2

Technical Background

In this chapter, the various standards used in the design of this system will be

discussed. The standards chosen were based on what was supported by the Java Media

Framework. There were possibly some more suitable options out there, for example

with the encoding schemes, but the choice was limited by what was supported by the

Java Media Framework and the Real-Time Transport Protocol. The standards

discussed within this chapter were the basic building blocks that this project was built

on.

2.1 Java Media Framework

2.1.1 Introduction

It is often the case that a Java developer will want to include some real-time media

within their Java application or applet. Prime examples of such real-time media would

be audio and video. The Java Media Framework (JMF) was developed to enable this

to happen. JMF allows the capture, playback, streaming and transcoding of multiple

media formats. JMF is an extension of the Java platform that provides a powerful

toolkit from which scalable, cross platform applications can be developed.

Any data that changes with respect to time can be characterized as real-time media.

With real-time media, the idea is that you will see it as it happens. So for example, if

you are partaking in a video conference, you expect that there should not be a

significant delay between when the other person says something to you, and when you

hear and see them saying it. Audio clips, MIDI sequences, movie clips, and animations

are common forms of time-based media. Such media data can be obtained from a

variety of sources, such as local or network files, cameras, microphones, and live

broadcasts. Figure 2.1, below, shows a media processing model. There are three main

elements within the system - the input, the output and the processor. Think of the input

as where the data comes from, this could be a capture device such as a video camera, a

file or it could be data that has been received over a network. Before the input can

reach the output, it has to be formatted so that it can be received correctly. This

formatting takes place in the processor. A processor can do many things to the data,


14/99


4

some of which include compressing/decompressing, applying effect filters and

converting into the correct format using the encoding scheme which has been

specified. Once the data has been correctly formatted by the processor, it is then

passed on to the output so that the end user can see or hear it. The output could simply

be a player, such as a speaker or a television, it could save the data to a file or it could

send it across the network.

Figure 2.1 - Media Processing Model

To relate the media processor model shown above to this particular project, let us take

a look at Figure 2.2. As can be seen immediately, this system has more components

than the one shown above. However it can still be divided into the same three parts,

input, processor and output. The input consists of the MediaLocator which

represents the address of the device, and the data source, which is constructed using

the MediaLocatorand is the interface to the device. The data is then taken from the

input and sent to the processor. The processor in the system consists of the processor

itself, which takes the data and converts it into the encoding scheme that has been

defined for the system. The other element of the processor is the RTPManager. The

transmission RTPManagertakes the encoded data from the processor and packetizes

it, so that it can be sent over the network. The data is then transmitted over the

network where it is met on the other side by the receiver RTPManager, which takes

the data and depacketizes it, converting it back into a format that can be read by the

player. Once this stage has been completed, the data is passed to the output, consisting

of the player and the speaker (the example shown here is for a voice call, the speaker

could be a monitor or any other sort of output device that the media can be seen or

heard on). The player takes the encoded data and decodes it, then sends it to the output

device so that the receiver can see or hear it.


15/99


5

Figure 2.2 - System Processing Model

2.1.2 JMF Architecture

The most practical example of real-time media comes from a basic home movie

system. I have shown this system below in Figure 2.3. Imagine someone is making a

home movie, the first thing that they do is record it onto a video tape using a

camcorder. So they are using a capture device the camcorder and recording onto a

data source the video tape.

Once they have made the movie, the next logical thing that they would want to do

would be to watch it. So, thinking of the system processing model, they would need

some sort of processor that would take the data from the data source and convert into

some format that they can see and hear. This processor would be a VCR. When the

data source is placed into the processor, the data is transmitted to the final stage of the

system processing model the output. In this case, the television will be the principle

output device. There will more than likely be speakers on the television that will

transmit the audio part of the media. So below we have a very basic processing model

that many people use every day at home.


16/99


6

Figure 2.3 - JMF Basic System Model

Yet even though the model shown in Figure 2.3 seems very basic, it still contains the

main elements of the more complicated system process model that is shown above, in

Figure 2.2.

2.1.3 Principle Elements

Data Source

In JMF, a DataSourceis the audio or media source, or possibly a combination of

the two e.g. a webcam with an integrated microphone. It could also be an incoming

stream across a network, for example the internet, or a file. Once the location orprotocol of the data is determined, the data source encapsulates both the media

location, and the protocol and software used to deliver the media. When a

DataSourceis sent to a Player, the Playeris unconcerned about the origin of

the DataSource.

There are two types of DataSources, determined by how the data transfer initiates:

Pull data source: Here the data flow is initiated by the client and the data flow

from the source is controlled by the client.

Push data source: Here the data flow is initiated by the server and the data flow

from the source is controlled by the server.

Several data sources can be combined into one. So if you are capturing a live scene

with two data sources: audio and video, these can be combined for easier control.


17/99


7

Capture Device

A capture device is the piece of hardware that you would use to capture the data,

which you would connect to the DataSource. Examples would be a microphone or

a webcam. The captured media can then be sent to the Player, converted into

another format or even stored to be used at a later stage.

Like DataSources, capture devices can be either a push or a pull source. If a

capture device is a pull source, then the user controls when to capture the image, if it is

a push source, then the user has no control over when the data is captured, it will be

captured continuously.

Player

As mentioned above, a Player takes a stream of data and renders it to an output

device. A Player can be in any one of a number of states. Usually, a Playerwould

go from one state to the next until it reaches the final state. The reason for these states

is so the data can be prepared before it is played. JMF defines the following six states

for the Player:

Unrealized: In this state, the Player object has just been instantiated and

does not yet know anything about its media.

Realizing:A Playermoves from the unrealized state to the realizing state

when the Player's realize()method is called. In this state, the Playeris

in the process of determining its resource requirements

Realized:Transitioning from the realizing state, the Playercomes into the

realized state. In this state the Playerknows what resources it needs and has

information about the type of media it is to present. It can also provide visual

components and controls, and its connections to other objects in the system are

in place. A player is often created already in this state, using the

createRealizedPlayer()method.

Prefetching: When the prefetch() method is called, a Player moves

from the realized state into the prefetching state. A prefetching Player is

preparing to present its media. During this phase, the Player preloads its

media data, obtains exclusive-use resources, and does whatever else is needed

to play the media data.


18/99


8

Prefetched:The state where the Playerhas finished prefetching media data

- it's ready to start.

Started: This state is entered when you call the start() method. The

Playeris now ready to present the media data.

Processor

A Processoris a type of Player, which has added control over what processing is

performed on the input media stream. As well as the six aforementioned Player

states, a Processor includes two additional states that occur before the

Processorenters the realizing state but after the unrealized state:

Configuring: A Processorenters the configuring state from the unrealized

state when the configure()method is called. A Processorexists in the

configuring state when it connects to the DataSource, demultiplexes the

input stream, and accesses information about the format of the input data.

Configured: From the configuring state, a Processor moves into the

configured state when it is connected to the DataSourceand the data format

has been determined.

As with a Player, a Processor transitions to the realized state when the

realize()method is called.

DataSink

The DataSinkis a base interface for objects that read media content delivered by a

DataSource and render the media to some destination, typically a file.

Format

A Format object represents an object's exact media format. The format itself carries no

encoding-specific parameters or global-timing information; it describes the format's

encoding name and the type of data the format requires. Format subclasses include,

AudioFormat

VideoFormat

In turn, VideoFormat contains six direct subclasses:

H261Format

H263Format


19/99


9

IndexedColorFormat

JPEGFormat

RGBFormat

YUVFormat

As will be discussed in more detail later on in this report, the formats that were chosenfor this project were H.263 for the audio and G.723 mono, for the audio.

Manager

A manager, an intermediary object, integrates implementations of key interfaces that

can be used seamlessly with existing classes. JMF offers four managers:

Manager: Use Manager to create Players, Processors,

DataSources, and DataSinks.

PackageManager:This manager maintains a registry of packages that contain

JMF classes, such as custom Players, Processors, DataSources, and

DataSinks.

CaptureDeviceManager: This manager maintains a registry of available

capture devices.

PlugInManager:This manager maintains a registry of available JMF plug-in

processing components.

2.1.4

Common Media Formats

Table 2.1 and Table 2.2 below identify some of the characteristics of common media

formats. When selecting the format for this system, the main consideration was the

bandwidth. This needed to be as low as possible. Obviously, the quality should be as

high as possible. The CPU requirement wasnt really an issue. Each client would be

working off separate computers with separate CPU capabilities, so it wasnt something

that needed to be taken into account in choosing the encoding schemes.

So looking at Table 1, which is the most common video formats, it can be seen that

H.263 is the only one that meets the low bandwidth requirement. The quality is

medium, which is perfectly acceptable for this sort of application. Therefore, this was

the video encoding scheme that was chosen.


20/99


10

Looking at Table 2, for the audio, it can be seen that there are two formats that meet

the low bandwidth requirement. These are GSM and G.723.1. Of these two, the former

has a low quality while the latter has medium quality. It therefore made more sense to

choose G.723.1. I have highlighted the chosen encoding schemes.

Format Content Type QualityCPU

Requirements

Bandwidth

Requirements

CinepakAVI

QuickTimeMedium Low High

MPEG-

1MPEG High High High

H.261AVI

RTPLow Medium Medium

H.263

QuickTime

AVI

RTP

Medium Medium Low

JPEG

QuickTime

AVI

RTP

High High High

IndeoQuickTime

AVIMedium Medium Medium

Table 2.1 JMF Common Video Formats


21/99


11

FormatContent

TypeQuality

CPU

Requirements

Bandwidth

Requirements

PCM

AVI

QuickTime

WAV

High Low High

Mu-Law

AVI

QuickTime

WAV

RTP

Low Low High

ADPCM

(DVI,IMA4)

AVI

QuickTime

WAV

RTP

Medium Medium Medium

MPEG-1 MPEG High High High

MPEG

Layer3MPEG High High Medium

GSMWAV

RTPLow Low Low

G.723.1WAV

RTPMedium Medium Low

Table 2.2 JMF Common Audio Formats

As it happens, the schemes that were chosen are ideal for the application as H.263 was

developed for video conferencing applications and is optimised for video where there

is not much movement, and G.723 is typically used for low bit rate speech, such astelephony applications.

2.1.5 Real Time Transport Protocol (RTP) Architecture in JMF

The JMF RTP APIs are designed to work seamlessly with the capture, presentation,

and processing capabilities of JMF. Players and processors are used to present and

manipulate RTP media streams just like any other media content. You can transmit


22/99


12

media streams that have been captured from a local capture device using a capture

DataSource or that have been stored to a file using a DataSink. Similarly, JMF can be

extended to support additional RTP formats and payloads through the standard plug-

in mechanism. [JavaTM

Media Framework API Guide,

http://java.sun.com/products/java-media/jmf/2.1.1/guide/index.html, November 19,

1999 (April 2005)]

Session Manager

In JMF, a SessionManager is used to coordinate an RTP session. The session

manager keeps track of the session participants and the streams that are being

transmitted. The session manager maintains the state of the session as viewed from the

local participant. The SessionManager interface defines methods that enable an

application to initialize and start participating in a session, remove individual streams

created by the application, and close the entire session.

Session Statistics: The session manager maintains statistics on all of the RTP

and RTCP packets sent and received in the session. The session manager

provides access to two types of global statistics:

o

GlobalReceptionStats: Maintains global reception statistics for the

session.

o

GlobalTransmissionStats: Maintains cumulative transmission

statistics for all local senders.

Statistics for a particular recipient or outgoing stream are available from the

stream:

o ReceptionStats: Maintains source reception statistics for an individual

participant.

o TransmissionStats: Maintains transmission statistics for an individual

send stream.

Session Participants: The session manager keeps track of all of the

participants in a session. Each participant is represented by an instance of a

class that implements the Participantinterface. SessionManagers create a

Participant whenever an RTCP packet arrives that contains a source

description (SDES) with a canonical name (CNAME) that has not been seen

before in the session. A participant can own more than one stream, each of


23/99


13

which is identified by the synchronization source identifier (SSRC) used by the

source of the stream.

Session Streams: The SessionManager maintains an RTPStream object for

each stream of RTP data packets in the session. There are two types of RTP

streams:o ReceiveStream represents a stream that's being received from a

remote participant.

o SendStream represents a stream of data coming from the

Processoror input DataSource that is being sent over the network.

A ReceiveStream is constructed automatically whenever the session

manager detects a new source of RTP data.

RTP Events

RTP-specific events used to report on the state of the RTP session and streams. To

receive notification of RTP events, you implement the appropriate RTP listener and

register it with the session manager:

SessionListener: Receives notification of changes in the state of the session.

You can implement SessionListenerto receive notification about events

that pertain to the RTP session as a whole, such as the addition of new

participants. There are two types of session-wide events:

o

NewParticipantEvent: Indicates that a new participant has joined the

session.

o

LocalCollisionEvent: Indicates that the participant's synchronization

source is already in use.

SendStreamListener: Receives notification of changes in the state of an RTP

stream that's being transmitted. You can implement SendStreamListener to

receive notification whenever:

o New send streams are created by the local participant.

o

The transfer of data from the DataSource used to create the send stream

has started or stopped.

o

The send stream's format or payload changes.

There are five types of events associated with a SendStream:

o

NewSendStreamEvent: Indicates that a new send stream has been

created by the local participant.


24/99


14

o

ActiveSendStreamEvent: Indicates that the transfer of data from the

DataSourceused to create the send stream has started.

o InactiveSendStreamEvent: Indicates that the transfer of data from the

DataSourceused to create the send stream has stopped.

o

LocalPayloadChangeEvent: Indicates that the stream's format orpayload has changed.

o StreamClosedEvent: Indicates that the stream has been closed.

ReceiveStreamListener: Receives notification of changes in the state of an

RTP stream that's being received. You can implement

ReceiveStreamListenerto receive notification whenever:

o New receive streams are created.

o

The transfer of data starts or stops.

o The data transfer times out.

o

A previously orphaned ReceiveStreamhas been associated with a

Participant.

o An RTCP APP packet is received.

o The receive stream's format or payload changes.

You can also use this interface to get a handle on the stream and access the

RTP DataSourceso that you can create a MediaHandler.

There are seven types of events associated with a ReceiveStream:

o

NewReceiveStreamEvent: Indicates that the session manager has

created a new receive stream for a newly-detected source.

o

ActiveReceiveStreamEvent: Indicates that the transfer of data has

started.

o

InactiveReceiveStreamEvent: Indicates that the transfer of data has

stopped.

o

TimeoutEvent: Indicates that the data transfer has timed out.

o RemotePayloadChangeEvent: Indicates that the format or payload of

the receive stream has changed.

o

StreamMappedEvent: Indicates that a previously orphaned receive

stream has been associated with a participant.

o

ApplicationEvent: Indicates that an RTCP APP packet has been

received.


25/99


15

RemoteListener: Receives notification of events or RTP control messages

received from a remote participant. You can implement RemoteListener

to receive notification of events or RTP control messages received from a

remote participant. You might want to implement RemoteListener in an

application used to monitor the session - it enables you to receive RTCPreports and monitor the quality of the session reception without having to

receive data or information on each stream. There are three types of events

associated with a remote participant:

o ReceiverReportEvent: Indicates that an RTP receiver report has been

received.

o SenderReportEvent: Indicates that an RTP sender report has been

received.

o

RemoteCollisionEvent: Indicates that two remote participants are

using the same synchronization source ID (SSRC).

2.1.6 Alternatives to JMF

There was no real alternative to JMF using Java. However, if another programming

language had been used there would have been alternatives available. An example

would be to use C++ programming language in conjunction with the Microsoft Direct

Show API, which includes libraries for rendering media content. There is an open

source project being undertaken at the moment for creating a SIP communicator usingJava and the JMF. Aside from this, there are no real similar applications to this using

Java and this was the reason that Java was chosen.

2.1.7 Summary

As can be seen from the above sections, JMF is a very powerful tool. It is very easy to

work with and the best way to understand it is to use it. It is fair to say that there is a

lot of information, such as forums, help-sites etc. on the World Wide Web regarding

this subject. However, there is not a lot of information on using JMF for projects

similar to this one. Perhaps one of the best features of JMF is that it does not require

one to learn everything about it before using it. With a basic understanding of Java, it

is possible to teach yourself as you go along.


26/99


16

2.2 Real-Time Transport Protocol

2.2.1 Introduction

The real-time transport protocol (RTP), provides end-to-end delivery services for data

with real-time characteristics, such as interactive audio and video. These services

include payload type identification, sequence numbering, time-stamping and delivery

monitoring. Applications typically run RTP on top of UDP to make use of its

multiplexing and checksum services; both protocols contribute to parts of the

transport protocol functionality. However, RTP may be used with other suitable

underlying network or transport protocols. RTP supports data transfer to multiple

destinations using multicast distribution if provided by the underlying network. [RTP

Technology, http://www.ixiacom.com/library/technology _guides /tg_display.php? key

= rtp, (April 2005)]

Although RTP is used for real-time media, it does not actually ensure that packets are

delivered on time itself, but relies on lower layer services to ensure this, and other

quality-of-service (QOS) guarantees. Each packet has a sequence number and this

allows the receiver to reconstruct the packets into the correct order.

In defining RTP, two closely linked parts will be described:

The real-time transport protocol (RTP), to carry data that has real-time

properties,

The RTP control protocol (RTCP), to monitor the quality of service and to

convey information about the participants in an on-going session.

The diagram that is shown below in Figure 2.4 - RTP and the OSI Model below,

shows how RTP is incorporated into the OSI model. RTP fits into the session layer of

the model, between the application layer and the transport layer. RTP and RTCP work

independent of the underlying Transport Layer and Network Layer protocols.

Information in the RTP header tells the receiver how to reconstruct the data and

describes how the codec bit streams are packetized.


27/99


17

Figure 2.4 - RTP and the OSI Model

2.2.2 Some RTP Definitions

RTP payload:The data transported by RTP in a packet, for example audio

samples or compressed video data.

RTP packet: A data packet consisting of the fixed RTP header, a possibly

empty list of contributing sources, and the payload data. Some underlying

protocols may require an encapsulation of the RTP packet to be defined.

Typically one packet of the underlying protocol contains a single RTP packet,

but several RTP packets may be contained if permitted by the encapsulation

method.

RTCP packet:A control packet consisting of a fixed header part similar to

that of RTP data packets, followed by structured elements that vary depending

upon the RTCP packet type. Typically, multiple RTCP packets are sent

together as a compound RTCP packet in a single packet of the underlying

protocol; this is enabled by the length field in the fixed header of each RTCP

packet.

Port: The abstraction that transport protocols use to distinguish among

multiple destinations within a given host computer. TCP/IP protocols identify

ports using small positive integers. RTP depends upon the lower-layer protocol

to provide some mechanism such as ports to multiplex the RTP and RTCP

packets of a session.


28/99


18

Transport address: The combination of a network address and port that

identifies a transport-level endpoint, for example an IP address and a UDP

port. Packets are transmitted from a source transport address to a destination

transport address.

RTP session:The association among a set of participants communicating withRTP. For each participant, the session is defined by a particular pair of

destination transport addresses (one network address plus a port pair for RTP

and RTCP). The destination transport address pair may be common for all

participants, as in the case of IP multicast, or may be different for each, as in

the case of individual unicast network addresses plus a common port pair. In a

multimedia session, each medium is carried in a separate RTP session with its

own RTCP packets. The multiple RTP sessions are distinguished by different

port number pairs and/or different multicast addresses.

Synchronization source (SSRC): The source of a stream of RTP packets,

identified by a 32-bit numeric SSRC identifier carried in the RTP header so as

not to be dependent upon the network address. All packets from a

synchronization source form part of the same timing and sequence number

space, so a receiver groups packets by synchronization source for playback.

Examples of synchronization sources include the sender of a stream of packets

derived from a signal source such as a microphone or a camera, or an RTP

mixer. A synchronization source may change its data format, e.g., audio

encoding, over time. The SSRC identifier is a randomly chosen value meant to

be globally unique within a particular RTP session. A participant need not use

the same SSRC identifier for all the RTP sessions in a multimedia session; the

binding of the SSRC identifiers is provided through RTCP. If a participant

generates multiple streams in one RTP session, for example from separate

video cameras, each must be identified as a different SSRC.

Contributing source (CSRC):A source of a stream of RTP packets that has

contributed to the combined stream produced by an RTP mixer. The mixer

inserts a list of the SSRC identifiers of the sources that contributed to the

generation of a particular packet into the RTP header of that packet. This list is

called the CSRC list. An example application is audio conferencing where a

mixer indicates all the talkers whose speech was combined to produce the

outgoing packet, allowing the receiver to indicate the current talker, even


29/99


19

though all the audio packets contain the same SSRC identifier (that of the

mixer).

End system: An application that generates the content to be sent in RTP

packets and/or consumes the content of received RTP packets. An end system

can act as one or more synchronization sources in a particular RTP session, buttypically only one.

Mixer:An intermediate system that receives RTP packets from one or more

sources, possibly changes the data format, combines the packets in some

manner and then forwards a new RTP packet. Since the timing among multiple

input sources will not generally be synchronized, the mixer will make timing

adjustments among the streams and generate its own timing for the combined

stream. Thus, all data packets originating from a mixer will be identified as

having the mixer as their synchronization source.

Translator: An intermediate system that forwards RTP packets with their

synchronization source identifier intact. Examples of translators include

devices that convert encodings without mixing, replicators from multicast to

unicast, and application-level filters in firewalls.

Monitor:An application that receives RTCP packets sent by participants in an

RTP session, in particular the reception reports, and estimates the current

quality of service for distribution monitoring, fault diagnosis and long-term

statistics. The monitor function is likely to be built into the application(s)

participating in the session, but may also be a separate application that does not

otherwise participate and does not send or receive the RTP data packets. These

are called third party monitors.

2.2.3 RTP Data Structures

Figure 2.5 below shows the structure of an RTP packet, with explanations of the

different components before it.

V is the Version, which identifies the RTP version.

P is the Padding for the protocols or algorithms that require a packet to be a

specific size. The padding field is a variable field that when set indicates that

the space at the end of the payload is padded with octets to make the packet the

proper size.


30/99


20

X is the Extension bit, when set, the fixed header is followed by exactly one

header extension with a defined format.

CSRC count contains the number of CSRC identifiers that follow the fixed

header.

M is the Marker, whose interpretation is defined by a profile, is intended toallow significant events such as frame boundaries to be marked in the packet

stream.

Payload type - Identifies the format of the RTP payload and determines its

interpretation by the application. A profile specifies a default static mapping of

payload type codes to payload formats. Additional payload type codes may be

defined dynamically through non-RTP means.

Sequence number increments by one for each RTP data packet sent, and may

be used by the receiver to detect packet loss and to restore packet sequence.

Timestamp reflects the sampling instant of the first octet in the RTP data

packet. The sampling instant must be derived from a clock that increments

monotonically and linearly in time to allow synchronization and jitter

calculations.

SSRC is an identifier that is chosen randomly, with the intent that no two

synchronization sources within the same RTP session have the same SSRC

identifier.

CSRC identifies the contributing sources for the payload contained in this

packet. This is another layer of identification for sessions that have the same

SSRC number, but the data in the stream needs to be differentiated further.

0 1 2 3

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

V P X CC M PT Sequence Number

TimeStamp

Synchronization Source (SSRC) Identifier

Contributing Source (CSRC) identifiers

. . . .

. . . .

Payload Packet

. . . .

Figure 2.5 RTP Packet Header Format


31/99


21

2.2.4 RTP Control Protocol

The RTP Control Protocol (RTCP) works by transmitting periodically to all

participants in the session, control packets, in much the same manner as data packets

are transmitted. RTCP performs four functions:

provides feedback on the quality of the data distribution, carries a persistent transport-level identifier for an RTP source called the

canonical name or CNAME,

by having each participant send its control packets to all the others, each can

independently observe the number of participants and this number is used to

calculate the rate at which the packets are sent,

conveys minimal session control information, which is an optional function,

RTCP serves as a convenient channel to reach all the participants, but it is not

necessarily expected to support all the control communication requirements of

an application.

Functions 1-3 are mandatory when RTP is used in the IP multicast environment, and

are recommended for all environments. RTP application designers are advised to avoid

mechanisms that can only work in unicast mode and will not scale to larger numbers.

RTCP Packet Format

As mentioned above, RTCP packets are sent periodically to all participants as well as

the data packets. There are a number of types of RTCP packets:

Sender Report

Receiver Report

Source Description

Bye

Application-specific

All participants in a session send RTCP packets. A participant that has recently sent

data packets issues a Sender Report (SR). The sender report contains the total number

of packets and bytes sent as well as information that can be used to synchronize media

streams from different sessions. The structure of the RTCP SR is shown in Figure 2.6

below. It consists of three sections, possibly followed by a fourth profile-specific

extension section if defined.


32/99


22

The first section, the header, is 8 octets long, with the following fields:

The version (V) is 2 bits and identifies the version of RTP, which is the same

in RTCP packets as in RTP data packets.

The padding (P) is 1 bit, if the padding bit is set, this RTCP packet contains

some additional padding octets at the end which are not part of the controlinformation. The last octet of the padding is a count of how many padding

octets should be ignored.

The reception report count (RC) is 5 bits and represents the number of

reception report blocks contained in this packet.

The packet type (PT) is 8 bits and contains the constant 200 to identify this as

an RTCP SR packet.

The length is 16 bits, the length of this RTCP packet in 32-bit words minus one

including the header and any padding.

The SSRC is 32 bits and is the synchronization source identifier for the

originator of this SR packet.

The second section, the sender information, is 20 octets long and is present in every

sender report packet. It summarizes the data transmissions from this sender and has the

following fields:

The NTP timestamp is 64 bits and indicates the wallclock time when this

report was sent so that it may be used in combination with timestamps returned

in reception reports from other receivers to measure round-trip propagation to

those receivers.

The RTP timestamp is 32 bits and corresponds to the same time as the NTP

timestamp (above), but in the same units and with the same random offset as

the RTP timestamps in data packets.

The sender's packet count is 32 bits and is the total number of RTP data

packets transmitted by the sender since starting transmission up until the time

this SR packet was generated. The count is reset if the sender changes its

SSRC identifier.

The sender's octet count is 32 bits and is the total number of payload octets

(i.e., not including header or padding) transmitted in RTP data packets by the

sender since starting transmission up until the time this SR packet was

generated. The count is reset if the sender changes its SSRC identifier. This

field can be used to estimate the average payload data rate.


33/99


23

The third section contains zero or more reception report blocks depending on the

number of other sources heard by this sender since the last report. Each reception

report block conveys statistics on the reception of RTP packets from a single

synchronization source. Receivers do not carry over statistics when a source changes

its SSRC identifier due to a collision. These statistics are:

The SSRC_n (source identifier) is 32 bits and is the SSRC identifier of the

source to which the information in this reception report block pertains.

The fraction lost is 8 bits and is the fraction of RTP data packets from source

SSRC_n lost since the previous SR or RR packet was sent, expressed as a fixed

point number with the binary point at the left edge of the field.

The cumulative number of packets lost is 24 bits and is the total number of

RTP data packets from source SSRC_n that have been lost since the beginning

of reception. This number is defined to be the number of packets expected less

the number of packets actually received, where the number of packets received

includes any which are late or duplicates.

The extended highest sequence number received is 32 bits. The low 16 bits

contain the highest sequence number received in an RTP data packet from

source SSRC_n, and the most significant 16 bits extend that sequence number

with the corresponding count of sequence number cycles.

The interarrival jitter is 32 bits and is an estimate of the statistical variance of

the RTP data packet interarrival time, measured in timestamp units and

expressed as an unsigned integer. The interarrival jitter J is defined to be the

mean deviation (smoothed absolute value) of the difference D in packet

spacing at the receiver compared to the sender for a pair of packets.

The last SR timestamp (LSR) is 32 bits and is the middle 32 bits out of 64 in

the NTP timestamp received as part of the most recent RTCP sender report

(SR) packet from source SSRC_n. If no SR has been received yet, the field is

set to zero.

The delay since last SR (DLSR) is 32 bits and is expressed in units of 1/65536

seconds, between receiving the last SR packet from source SSRC_n and

sending this reception report block. If no SR packet has been received yet from

SSRC_n, the DLSR field is set to zero.


34/99


24

0 1 2 3

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

V P RC M PT = SR = 200 Length

SSRC of Sender

NTP timestamp, most significant word

NTP timestamp, least significant word

RTP Timestamp

Senders Packet Count

Senders Octet Count

SSRC_1 (SSRC of first source)

Fraction Lost Cumulative number of packets lost

extended highest sequence number received

interarrival jitter

last SR (LSR)

delay since last SR (DLSR)

SSRC_2 (SSRC of second source)

. . . .

profile-specific extensions

Figure 2.6 - RTCP Sender Report Structure

Session participants periodically issue Receiver Reports (RR) for all of the sources

from which they are receiving data packets. A receiver report contains information

about the number of packets lost, the highest sequence number received, and a

timestamp that can be used to estimate the round-trip delay between a sender and the

receiver. The format of the receiver report (RR) packet, as shown in Figure 2.7 below,

is the same as that of the SR packet except that the packet type field contains the

constant 201 and the five words of sender information are omitted (these are the NTP

and RTP timestamps and sender's packet and octet counts). The remaining fields have

the same meaning as for the SR packet. An empty RR packet (RC = 0) is put at thehead of a compound RTCP packet when there is no data transmission or reception to

report.


35/99


25

0 1 2 3

0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

V P RC M PT = SR = 200 Length

SSRC of Sender

SSRC_1 (SSRC of first source)

Fraction Lost Cumulative number of packets lost

extended highest sequence number received

interarrival jitter

last SR (LSR)

delay since last SR (DLSR)

SSRC_2 (SSRC of second source)

. . . .

profile-specific extensions

Figure 2.7 - RTCP Receiver Report Structure

2.2.5 Alternatives to RTP

Once JMF had been chosen, there was really no better option than the real-time

transport protocol. However, it would be possible to implement a proprietary protocol

using the custom packetizers provided by JMF, along with UDP or TCP. However,

TCP is not suitable for real-time data because of the delays it introduces, due to packet

retransmission and UDP is unsuitable without a higher level features to deal with

packet sequencing and loss. Another alternative to RTP could be RTSP, however this

JMF only limited compatibility for this.

2.2.6 Summary

The Real Time Transport Protocol is a lot more expansive than described above.

However, for what it was used within this project, the detail given above is more than

adequate. It is important to understand the different packet structures that are shown,

as these form the basis by which all data within the system was sent.


36/99


26

2.3 Audio Encoding Scheme G.723.1

2.3.1 Introduction

As mentioned earlier, the audio encoding scheme which was chosen was G.723.1. This

format is ideal for compressing the audio signal component of multimedia services at a

very low bit rate. In this application it will be used for the audio side of the video

conferencing. The coder that is used was designed to represent speech with a high

quality using a limited amount of complexity. It is not ideal for audio signals other

than speech, for example music, but can be used for them.

The coder involved can operate at one of two bit rates, either 5.3 kbit/s or 6.3 kbit/s.

The higher bit rate has better quality, the lower, whilst still maintaining an adequate

quality also offers more flexibility to the designer. Both rates must be implemented

within in the encoder and the decoder. [3]

Audio signals are encoded by the coder in 30 msec frames; there is also a look ahead

of 7.5 msec. This results in a total delay of 37.5 msec. Any additional delays in the

operation and implementation of the coder can be attributed to:

actual time spent processing the data in the encoder and decoder,

transmission time on the communication link,

additional buffering delay for the multiplexing protocol.

2.3.2 Encoder Principles

The block diagram of the encoder is shown in Figure 2.8 below. As can be seen there

are a number of different blocks, the functions of which are beyond the scope of this

project.


37/99


27

!

"

#$

%

#

!&

'($#!

)*

%+

,)*

-

)*

)*

)*

&)*

$./

"./

)*)*

)*

)* 0)*

)*)*

1)*

./

2 3

Figure 2.8 - G.723.1 Encoder

2.3.3 Decoder Principles

The block diagram of the decoder is shown below, in Figure 2.9. It is just shown for

diagrammatic purposes, and the functions of the blocks do not need to be understoodfor this project.

Figure 2.9 - G723.1 Decoder


38/99


28

2.3.4 Alternative Audio Encoding Schemes

As shown in Table 2.2 JMF Common Audio Formats, the only other format with the

required low bandwidth is GSM. The reason that this format was not chosen, is that

G.723 mono is a better quality. This was the only reason for choosing the scheme that

was chosen. ADPCM(DVI, IMA4) and Mu-Law are also suitable for RTP data,however they do not meet the low bandwidth requirements.

2.3.5 Summary

This format was well chosen as it is ideal for the purpose that it will be used for within

this application, which is basically the voice part of the video conferencing. Although

it is possible to go very deep into the workings of the coder and decoder, it is not

necessary for this project. It is sufficient to know the basics of how it works and what

it is suitable to be used for.

2.4 Video Encoding Scheme H.263

2.4.1 Introduction

The H.263 format is ideal for encoding video images without much movement, at low

bit rates. Pictures are sampled at an integer multiple of the video line rate. This

sampling clock and the digital network clock are asynchronous. The transmission

clock is provided externally. The video bit rate may be variable. [4]

2.4.2 Summary of Operation

The diagram in Figure 2.10 shows an H.263 baseline encoder. The algorithms

involved in the operation of this encoder are far beyond the scale of this project. It is

sufficient to know that it exists and is used in the encoding scheme.


39/99


29

Figure 2.10 - H.263 Baseline Encoder

2.4.3 Alternative Video Encoding Schemes

As shown in Table 2.1 JMF Common Video Formats, the other video formats

supported by RTP include H.261 and JPEG, however neither of these meet the low

bandwidth requirement. At the beginning of the project, it was thought that MPEG

would be used. The reason that this was not chosen is that MPEG does not support

capture from a live video source. It would only support a pre-recorded video or

capture from an MPEG enabled data source. This would not have been suitable for

video calls.

2.4.4 Summary

H.263 can be used for compressing the moving picture component of audio-visual

services at low bit rates. It is ideal for uses in video conferencing as there is not much

movement involved and low bit rates are used. This makes it the ideal encoding

scheme for this application.

2.5

Image Observation

2.5.1 Initial Ideas

Initially, it was thought that some kind of motion detection algorithm would be used to

implement the image observation feature. A number of possibilities were looked into

when researching this prospect, some of which included:


40/99


30

Motion Estimation: used to predict frames within a video sequence using

previous frames, with the help of motion vectors. The use of motion vectors

mean that only the changes in the frames are sent, as opposed to the whole

frame.

Fixed Size Block Matching: each image frame is divided into a fixed numberof blocks. For each block in the frame, a search is made in the reference frame

over an area of the image for the best matching block, to give the least

prediction error.

Motion Compensation: motion compensation uses blocks from a past frame

to construct a replica of the current frame. For each block in the current frame

a matching block is found in the past frame and if suitable, its motion vector is

substituted for the block during transmission.

After examining the specification for H.263, it was discovered that there were motion

detection and compensation algorithms built into it. This meant that the algorithm did

not have to be coded, it was already there and available to use. RTCP reports were

used to show the byte rate of the video stream, which was then used to implement the

image observation.

2.5.2 The Way it Works

Basically, the H.263 video encoding scheme was used in the implementation of the

image observation. The motion estimation and compensation that is built into the

format was used [1]. This assumes that the pixels within a current picture can be

modelled as a translation of those within a previous picture. Each macroblock is

predicted from the previous frame. The concept of macroblocks is explained below in

Figure 2.11. Each pixel within the macroblock undergoes the same amount of

translational motion, which is represented by two-dimensional motion vectors or

displacement vectors.


41/99


31

Figure 2.11 - Macroblocks within H.263

The basic idea behind the motion detection is shown in Figure 2.12 below.

Figure 2.12 - Motion Prediction


42/99


32

The way that the above was used for the image observation is as follows. When a

frame hasnt changed, a reference to a previous frame is sent. Basically, the image

observation feature exploits the temporal redundancy inherent in a video sequence.

The redundancy is larger when a camera is focused on an image that does not contain

a lot of movement. This is the case when a user leaves the shot. This redundancy is

reflected in a reduced RTCP byte rate.

This reference frame is then displayed which requires less byte rate than if a new

frame is sent. The RTCP reports monitor the byte rate of the video stream. If the byte

rate drops, and stays dropped for a certain period of time, then the call is ended. The

procedure to end the call is explained in more detail in section 3.5.

2.6

Multicasting

For the conferencing feature of this application, multicasting was used. All of the

participants within the conference transmit to a multicast address.

2.6.1 Alternatives to Multicasting

Another option that was looked at for the conferencing feature was to just allow all

participants to transmit and receive from and to each other at the same time. This setup

is shown in Figure 2.13. Basically, when the conference button was pressed, one call

would have been able to set up on top of another, so that two calls could take place

simultaneously and that participants would be able to listen for all streams.


43/99


33

Figure 2.13 - Original Conferencing Plan

As could be imagined, this method would be very cumbersome. It would use a lot of

system resources as there would be an unnecessary amount of streams being sent. It is

impractical for a user to have to transmit their data more than once. This idea was

decided against.

2.6.2 What is Multicasting

Multicasting is when a packet is sent to a host group, which is a set of hosts

identified by a single IP address. A multicast datagram is then delivered to all

members of the destination group [2]. Hosts may join or leave the group at any time as

membership of the group is dynamic. A host can be a member of more than group at a

time and a host does not need to be a member of a group to send datagrams to it. There

are two types of host groups; a permanent host group is one which has a well-known,

administratively assigned IP address, which is permanent. A permanent host group can

have any number of members, even zero. The remainder of the IP addresses are

available for dynamic assignment to the other type of group, which is known as a

transient group. This second type of group only exists as long as it has members.

The forwarding of IP multicast datagrams is handled by multicast routers. When a

datagram is transmitted by the host, it is sent as a local network multicast and will be

delivered to all members of the destination host group. The addresses which are

allocated to the host groups are known as class D IP addresses and range from


44/99


34

224.0.0.0 to 239.255.255.255. The diagram in Figure 2.14 shows how the data is

distributed to all members of the group.

Figure 2.14 - Multicasting through Router

2.7 Summary

The information contained within this chapter has been an invaluable asset in

developing this application. A firm understanding of all the standards was required

before coding could even begin. JMF placed a lot of restrictions on the standards that

could be used. JMF does provide the ability to implement custom packetizers and

custom encoders, however to so would have been time consuming and unnecessary for

this application.


45/99


35

Chapter 3

3

Design of the System

3.1

System ArchitectureThe system as it stands consists of two different communication architectures. One is

client to server and the other is client to client. The reason that there are two different

methods is to make the system as efficient as possible. There was the possibility of

using client to server for all communication; however it was felt that this would be

inefficient as the server did not need to be part of a call between two clients, it would

have been an unnecessary use of system resources. For this reason, calls between

clients are peer to peer and all other communication goes through the server.

3.1.1 Client to Server Communication

This architecture is used for all system messages, for the setting up of calls, etc;

basically for everything other than actual calls. There will be one server and there can

be any number of clients connected to that server. The client to server configuration is

shown in Figure 3.1.

Figure 3.1 - Client to Server Communication


46/99


36

The connections between the server and the clients are bidirectional TCP connections.

It was not necessary to use RTP here as they are not real time connections. RTP is

described in section 2.2 as being ideal for real time communication. The messages that

are sent between the clients and server will include login, logoff, messages to be sent,

calls to be made etc. which are not time dependent. The server plays an integral part in

the system. Basically, all communication between any two clients must first go

through the server. So if a client wishes to call another client, they must send a call

request to the server. The server will then proceed to set up the call between the

clients. The code for this is shown in Appendix 1 in section 7.1. Also included in

Appendix 1 are the code extracts for login request (section 7.2), logoff request (section

7.3), call end request (section 7.4), conference setup request (section 7.5), request to

add a participant to a conference (section 7.6), request to end a conference (section

7.7), request to send a message (section 7.8) and request to receive a message (section7.9).

The purpose of including these code extracts is to show that the server really does

control everything that the clients want to do. It will be the server that will check if the

other party is online and available, and the server that will set up the call. If a client is

unavailable when a message is sent, the server will store the message until they

become available and will then forward it on. Some might ask why a server is

required, why not just let the clients communicate directly. This was basically a design

choice. It was the opinion of the developer that direct client to client communication

for all tasks would mean that the load on the clients will be quite large, which was

unnecessary. If it was up to the clients to do everything, then the system would be

slowed down sufficiently. The server will act as a centre point, where clients can

contact each other. Without the server, the clients would have difficulty in contacting

another client. It was also a lot more efficient to let the server take some of the load

and leave all administration to the server, leaving the clients free to partake in calls,

send messages etc. It also meant that messages could be sent while clients are on calls

because the server can store the message, and messages can also be sent when the

receiving client is offline and stored until their next login, something that would not

have been possible without a server.


47/99


37

3.1.2 Client to Client Communication

The other form of communication built into the system architecture is direct

communication between two clients. This will only occur at one time, during a call. As

shown in the previous section, the server is required to set up the call. However, once

the call has been set up, the server drops out and the streams are sent directly betweenclients. The client to client architecture is shown in Figure 3.2. This type of

communication exists solely for calls. This is where the real-time transport protocol

discussed in section 2.2 will be employed. Voice and video streams are synchronised.

This is done within the RTP protocol. In section 2.2.4, the section on RTCP, the

canonical name (CNAME) was described as being an identifier for every stream.

When two streams, namely a voice and a video, have the same CNAME, it implies

that they are being sent from the same source and they are automatically synchronized.

This was important for the application as it is a fundamental expectation of a video

conference that the voice and video will be synchronised.

Figure 3.2 Client to Client Communication

Once again, the decision to choose this architecture can be justified. It would have

been possible to let the server remain during the call but it would have been of no

benefit. The decision to remove the server from the call reduced the server complexity

and decreased the system load.

3.2 System Design

When undertaking a software project such as the one described in this report, it is

important to have a good design brief. One of the most effective ways to design such a

system is to create class diagrams. These clearly show the methods that are part of

each class as well as the relationships between classes, and can give an in depth

understanding of the overall system.


48/99


38

3.2.1 The Server

The first class diagram, shown in Figure 3.3 is one the server and its related classes.

As described in the previous section, the server plays an integral part in the overall

functionality of the system.

Figure 3.3 - Server Class Diagram

As can be seen, the server is the parent class and there are three child classes,

ServerHandle, ServerSideUtilities and ServerSideStorage. There

can only be one server, and the methods within the server are mainly just for the

graphical user interface and will be inherited by the three child classes. The

ServerHandleis the interface a client communicates with for access to server side

resources. The server handle is responsible for the way messages are sent and is

responsible for telling the server what to do when it receives a message, depending on

the type of message received. There are two types of message, a push message and a


49/99


39

pull message, and both these messages can be sent or received. There are push and

pull links on all clients and on the server. The reason is that normally in a client server

application, only the client can start communication with the server, but by using push

and pull either can initiate communication. A push message is sent by either the client

or server, depending on who initialises communication, and the response to a push

message is a pull message. The person who sends a push will receive back a push, and

the person who sends a pull will receive a pull. An example is shown below, in Figure

3.4. This type of communication would be used in a situation where the user presses a

button on the client side that initializes communication with the server. However,

within this application, there will usually only be one send and one receive per task

(request and confirmation / error), as opposed to two of each, as shown in the diagram.

Figure 3.4 - Example of Push Pull Message Setup

It should be noted that the layout of the diagram above is not the only layout available,

the client and server could be reversed, with the server sending the push and the client

sending the pull. A typical example of this is when a server receives a message which

has to be pushed to its destination; this is the case for UMS delivery. The push and

pull messages are dealt with by the ServerHandle. There can be many

ServerHandlesassociated with one server, as there is one created for every client

that con

project sampal

Documents