project sampal
TRANSCRIPT
-
8/21/2019 Project Sampal
1/99
Video Conferencing System
with
Multimedia Capabilities
Janet Adams
April 2005
BACHOLOR OF ENGINEERING
IN
TELECOMMUNICATIONS ENGINEERING
Supervised by Dr. Derek Molloy
-
8/21/2019 Project Sampal
2/99
Video Conferencing System Janet Adams
ii
Acknowledgements
I would like to thank Dr. Derek Molloy, who supervised me on this project, for his
enthusiasm and guidance. I would also like to thank Edward Casey, whom I
collaborated with on certain areas of the project, and his supervisor, Dr. Gabriel
Muntean, for his support and advice. My thanks also go to my friends Edward Casey,Edel Harrington and Hector Climent for listening to me and guiding me through my
initial presentation. I would like to dedicate this project to my parents, who have
supported me throughout all my time in college and especially during this, my final
year.
DeclarationI hereby declare that, except where otherwise indicated, this document is entirely my
own work and has not been submitted in whole or in part to any other university.
Signed: ...................................................................... Date: ......................................
-
8/21/2019 Project Sampal
3/99
Video Conferencing System Janet Adams
iii
Abstract
This document will describe the development of a video conferencing system with
multimedia capabilities. The concept of multicasting will be explored as this was used
in the development of the video conferencing. Other concepts, which were used in the
development of the system such as Java Media Framework, Real-time TransportProtocol and a number of encoding schemes, will also be investigated.
The design of the system will explain how each of the features was planned for and
developed, and will provide the user with an understanding of video conferencing,
client server communications, motion detection and much more. The implementation
section is read like a user manual. On completion of this section, the reader should be
able to make full use of all of the features within the application and should
understand the depth to which each of the features can be used.
When this document has been read, the reader will fully understand both how the
system was developed and how it can be used, as well as understanding the necessary
technical information to understand how the different features work.
-
8/21/2019 Project Sampal
4/99
Video Conferencing System Janet Adams
iv
Table of Contents
ACKNOWLEDGEMENTS.........................................................................................II
DECLARATION..........................................................................................................II
ABSTRACT ................................................................................................................III
TABLE OF CONTENTS........................................................................................... IV
TABLE OF FIGURES............................................................................................ VIII
TABLE OF TABLES...................................................................................................X
1 INTRODUCTION.................................................................................................1
1.1 AIM OF THIS PROJECT...................................................................................1
1.2 CURRENT EXAMPLES OF SIMILAR APPLICATIONS ........................................1
1.3 EQUIPMENT AND SOFTWARE ........................................................................2
1.3.1 JBuilder 2005..........................................................................................2
1.3.2 Logitech Webcam....................................................................................2
1.3.3 Laptop .....................................................................................................2
1.3.4 Digital Camcorder ..................................................................................2
2 TECHNICAL BACKGROUND ..........................................................................3
2.1 JAVA MEDIA FRAMEWORK...........................................................................3
2.1.1 Introduction.............................................................................................3
2.1.2 JMF Architecture ....................................................................................5
2.1.3 Principle Elements ..................................................................................6
2.1.4 Common Media Formats.........................................................................9
2.1.5 Real Time Transport Protocol (RTP) Architecture in JMF..................11
2.1.6 Alternatives to JMF...............................................................................15
2.1.7 Summary................................................................................................15
2.2 REAL-TIME TRANSPORT PROTOCOL...........................................................16
2.2.1 Introduction...........................................................................................16
2.2.2 Some RTP Definitions ...........................................................................17
2.2.3 RTP Data Structures .............................................................................19
2.2.4 RTP Control Protocol ...........................................................................21
-
8/21/2019 Project Sampal
5/99
Video Conferencing System Janet Adams
v
2.2.5 Alternatives to RTP ...............................................................................25
2.2.6 Summary................................................................................................25
2.3 AUDIO ENCODING SCHEME G.723.1........................................................26
2.3.1 Introduction...........................................................................................26
2.3.2 Encoder Principles................................................................................26
2.3.3 Decoder Principles................................................................................27
2.3.4 Alternative Audio Encoding Schemes ...................................................28
2.3.5 Summary................................................................................................28
2.4 VIDEO ENCODING SCHEME H.263...........................................................28
2.4.1 Introduction...........................................................................................28
2.4.2 Summary of Operation ..........................................................................28
2.4.3 Alternative Video Encoding Schemes....................................................29
2.4.4 Summary................................................................................................292.5 IMAGE OBSERVATION ................................................................................29
2.5.1 Initial Ideas ...........................................................................................29
2.5.2 The Way it Works ..................................................................................30
2.6 MULTICASTING ..........................................................................................32
2.6.1 Alternatives to Multicasting..................................................................32
2.6.2 What is Multicasting .............................................................................33
2.7 SUMMARY ..................................................................................................34
3 DESIGN OF THE SYSTEM ..............................................................................35
3.1 SYSTEM ARCHITECTURE ............................................................................35
3.1.1 Client to Server Communication...........................................................35
3.1.2 Client to Client Communication............................................................37
3.2 SYSTEM DESIGN .........................................................................................37
3.2.1 The Server .............................................................................................38
3.2.2 The Client ..............................................................................................40
3.3 MESSAGING STRUCTURE............................................................................41
3.4 CONFERENCING..........................................................................................42
3.5 IMAGE OBSERVATION ................................................................................46
3.6 COMMON PROCEDURES WITHIN THE APPLICATION ....................................47
3.6.1 Login .....................................................................................................47
3.6.2 Call Setup ..............................................................................................48
3.6.3 Call Teardown.......................................................................................49
-
8/21/2019 Project Sampal
6/99
Video Conferencing System Janet Adams
vi
3.6.4 Logout ...................................................................................................50
3.7 OTHER FEATURES WITHIN THE APPLICATION.............................................50
4 IMPLEMENTATION OF THE SYSTEM .......................................................51
4.1 INTRODUCTION...........................................................................................51
4.2 LOGGING IN ...............................................................................................51
4.3 CALLS ........................................................................................................52
4.3.1 Making a Peer to Peer Call ..................................................................52
4.3.2 Receiving a Person to Person Call .......................................................55
4.3.3 Initiating a Conference Call..................................................................56
4.3.4 Joining a Conference Call ....................................................................57
4.4 MESSAGES..................................................................................................58
4.4.1 Sending an MMS Message ....................................................................58
4.4.2 Receiving an MMS Message .................................................................61
4.4.3 Videomail Messages..............................................................................63
4.5 EXTRA FEATURES ......................................................................................64
4.5.1 Image Observation................................................................................64
4.5.2 Adaption ................................................................................................64
4.6 USING THE SERVER ....................................................................................64
5 RESULTS AND DISCUSSION .........................................................................69
6 CONCLUSIONS AND FURTHER RESEARCH............................................756.1 THE BENEFITS OF THIS PROJECT.................................................................75
6.2 THE IMPACT OF THIS PROJECT....................................................................75
6.3 FUTURE RESEARCH POSSIBILITIES .............................................................76
6.4 MEETING THE REQUIREMENTS ...................................................................77
REFERENCES............................................................................................................78
7 APPENDIX 1 .......................................................................................................79
7.1 CALL SETUP REQUEST ...............................................................................79
7.2 LOGIN REQUEST .........................................................................................80
7.3 LOGOFF REQUEST ......................................................................................81
7.4 CALL END REQUEST...................................................................................82
7.5 CONFERENCE SETUP REQUEST ...................................................................83
7.6 ADD PARTICIPANT TO CONFERENCE REQUEST...........................................84
-
8/21/2019 Project Sampal
7/99
Video Conferencing System Janet Adams
vii
7.7 END CONFERENCE REQUEST ......................................................................85
7.8 SEND MESSAGE REQUEST ...........................................................................87
7.9 RECEIVE MESSAGE REQUEST .....................................................................87
8 APPENDIX 2 .......................................................................................................89
8.1 IMAGE OBSERVATION CODE ......................................................................89
-
8/21/2019 Project Sampal
8/99
Video Conferencing System Janet Adams
viii
Table of Figures
FIGURE 2.1-MEDIA PROCESSING MODEL.......................................................................4
FIGURE 2.2-SYSTEM PROCESSING MODEL.....................................................................5
FIGURE
2.3-JMF
B
ASICS
YSTEMM
ODEL.......................................................................6
FIGURE 2.4-RTPAND THE OSIMODEL .......................................................................17
FIGURE 2.5RTPPACKET HEADER FORMAT ...............................................................20
FIGURE 2.6-RTCPSENDER REPORT STRUCTURE ........................................................24
FIGURE 2.7-RTCPRECEIVER REPORT STRUCTURE .....................................................25
FIGURE 2.8-G.723.1ENCODER ....................................................................................27
FIGURE 2.9-G723.1DECODER .....................................................................................27
FIGURE 2.10-H.263BASELINE ENCODER ....................................................................29
FIGURE 2.11-MACROBLOCKS WITHIN H.263 ...............................................................31
FIGURE 2.12-MOTION PREDICTION..............................................................................31
FIGURE 2.13-ORIGINAL CONFERENCING PLAN ............................................................33
FIGURE 2.14-MULTICASTING THROUGH ROUTER ........................................................34
FIGURE 3.1-CLIENT TO SERVER COMMUNICATION ......................................................35
FIGURE 3.2CLIENT TO CLIENT COMMUNICATION ......................................................37
FIGURE 3.3-SERVER CLASS DIAGRAM.........................................................................38
FIGURE 3.4-EXAMPLE OF PUSH PULL MESSAGE SETUP ...............................................39
FIGURE 3.5-CLIENT CLASS DIAGRAM..........................................................................41
FIGURE 3.6-ALLOCATING A CONFERENCE POSITION ...................................................43
FIGURE 3.7-MESSAGE SEQUENCE CHART FOR CONFERENCE CALL .............................44
FIGURE 3.8-CONFERENCING SETUP .............................................................................45FIGURE 3.9-IMAGE OBSERVATION AVERAGES.............................................................46
FIGURE 3.10-MESSAGESEQUENCE CHART FOR LOGIN ................................................47
FIGURE 3.11-MESSAGE SEQUENCE CHART FOR CALL SETUP ......................................48
FIGURE 3.12-MESSAGE SEQUENCE CHART FOR CALL TEARDOWN..............................49
FIGURE 3.13-MESSAGE SEQUENCE CHART FOR LOGOUT.............................................50
FIGURE 4.1-LOGIN SCREEN .........................................................................................52
FIGURE 4.2-HOME SCREEN ..........................................................................................53
FIGURE 4.3-MAKING A P2PCALL ...............................................................................54
FIGURE 4.4-DURING A CALL........................................................................................55
FIGURE 4.5-CALL ACCEPT/REJECT .............................................................................56
FIGURE 4.6-INITIATING A CONFERENCE CALL .............................................................57
FIGURE 4.7-CONFERENCE REQUEST ............................................................................58
FIGURE 4.8-MMSSCREEN ..........................................................................................59
FIGURE 4.9-ATTACH BUTTON FILE CHOOSER .............................................................60
FIGURE 4.10-MMSSCREEN READY TO SEND..............................................................61
FIGURE 4.11UNIFIED INBOX SCREEN.........................................................................62
-
8/21/2019 Project Sampal
9/99
Video Conferencing System Janet Adams
ix
FIGURE 4.12-MESSAGE POPUP WINDOW .....................................................................62
FIGURE 4.13-LEAVE VIDEOMAIL REQUEST .................................................................63
FIGURE 4.14-VIDEOMAIL COMPOSE ............................................................................63
FIGURE 4.15VIDEOMAIL POPUP.................................................................................64
FIGURE 4.16-SERVER LOGIN SCREEN ..........................................................................65
FIGURE 4.17-SERVER ACTIVITY SCREEN .....................................................................66FIGURE 4.18-SERVER CLIENT STATUS SCREEN ...........................................................66
FIGURE 4.19-SERVER CLIENT STATUS SCREEN WITH CLIENTS ....................................67
FIGURE 4.20-SERVER ADMINISTRATION SCREEN ........................................................68
-
8/21/2019 Project Sampal
10/99
Video Conferencing System Janet Adams
x
Table of Tables
TABLE 2.1JMFCOMMON VIDEO FORMATS ...............................................................10
TABLE 2.2JMFCOMMON AUDIO FORMATS...............................................................11
TABLE 5.1-TESTING SCENARIOS:LOGIN/LOGOFF ......................................................70
TABLE 5.2-TESTING SCENARIONS:MAKING A CALL ...................................................71
TABLE 5.3-TESTING SCENARIOS:SENDING A MESSAGE ..............................................72
TABLE 5.4-TESTING SCENARIOS:CONFERENCE CALL .................................................73
TABLE 5.5-OTHER TESTING SCENARIOS......................................................................74
-
8/21/2019 Project Sampal
11/99
Video Conferencing System Janet Adams
1
Chapter 1
1
Introduction
All business organisations, for example, office blocks, colleges and shopping centres,
have telephone systems installed in them. These telephone systems allow features such
as call forward, call divert, voicemail, free extension dialling to other users within in
the same network, etc. Another object that is found in almost all of these facilities is
computers, usually one per user. Therefore, in the majority of establishments, you will
find that every employee has a telephone handset and a computer. A cost effective and
space saving idea would be to combine these two everyday utilities so that the
computer can also be used as a phone. People want their lives and work to be as
simple and time efficient as possible and one way to achieve this is to have a software
based telephony system on their computers. Why do they need a physical telephone
handset when it is possible to attain all the same features on their computers, cutting
out the expense of the handset?
1.1 Aim of this Project
The aim of this project is to develop a video conferencing facility with multicasting
capability and MMS functionality. The application will be developed in Java making
use of the Java Media Framework for real time applications. The project will be
developed in conjunction with Edward Casey, who will develop Videomail and
adaption features to add to the system.
1.2 Current Examples of Similar Applications
There are some examples of software based phone systems available. One example is
Skype, an internet phone system. This allows users to have voice conversations, free
of charge, over the internet, provided that the party they are calling is also using the
Skype service. The disadvantage is that a company employing this system would have
no control over their users. Another example is Vonage, which offers the same sort of
service as Skype and hence the same disadvantages.
-
8/21/2019 Project Sampal
12/99
Video Conferencing System Janet Adams
2
1.3 Equipment and Software
1.3.1 JBuilder 2005
This is the program that was used to code and compile all of the Java code. The reason
that this program was chosen is that it was available for free and it was very
straightforward to use. It was simple to use but it did the job. One of the features that
was very helpful in this program was that it highlighted any common coding errors,
which saved a lot of time. In other situations, the developer may not have been
informed of these errors until after compilation.
1.3.2 Logitech Webcam
This was used for the testing of the video calls.
1.3.3
LaptopTesting was difficult as very few of the features could be tested alone. Almost all
testing required two computers. For this reason, it was most efficient to use two
laptops connected to two webcams.
1.3.4 Digital Camcorder
The digital camcorder was used for the development of the image observation, as the
low quality of the webcam introduced to the image, which hampered the calculation of
an adequate threshold value.
-
8/21/2019 Project Sampal
13/99
Video Conferencing System Janet Adams
3
Chapter 2
2
Technical Background
In this chapter, the various standards used in the design of this system will be
discussed. The standards chosen were based on what was supported by the Java Media
Framework. There were possibly some more suitable options out there, for example
with the encoding schemes, but the choice was limited by what was supported by the
Java Media Framework and the Real-Time Transport Protocol. The standards
discussed within this chapter were the basic building blocks that this project was built
on.
2.1 Java Media Framework
2.1.1 Introduction
It is often the case that a Java developer will want to include some real-time media
within their Java application or applet. Prime examples of such real-time media would
be audio and video. The Java Media Framework (JMF) was developed to enable this
to happen. JMF allows the capture, playback, streaming and transcoding of multiple
media formats. JMF is an extension of the Java platform that provides a powerful
toolkit from which scalable, cross platform applications can be developed.
Any data that changes with respect to time can be characterized as real-time media.
With real-time media, the idea is that you will see it as it happens. So for example, if
you are partaking in a video conference, you expect that there should not be a
significant delay between when the other person says something to you, and when you
hear and see them saying it. Audio clips, MIDI sequences, movie clips, and animations
are common forms of time-based media. Such media data can be obtained from a
variety of sources, such as local or network files, cameras, microphones, and live
broadcasts. Figure 2.1, below, shows a media processing model. There are three main
elements within the system - the input, the output and the processor. Think of the input
as where the data comes from, this could be a capture device such as a video camera, a
file or it could be data that has been received over a network. Before the input can
reach the output, it has to be formatted so that it can be received correctly. This
formatting takes place in the processor. A processor can do many things to the data,
-
8/21/2019 Project Sampal
14/99
Video Conferencing System Janet Adams
4
some of which include compressing/decompressing, applying effect filters and
converting into the correct format using the encoding scheme which has been
specified. Once the data has been correctly formatted by the processor, it is then
passed on to the output so that the end user can see or hear it. The output could simply
be a player, such as a speaker or a television, it could save the data to a file or it could
send it across the network.
Figure 2.1 - Media Processing Model
To relate the media processor model shown above to this particular project, let us take
a look at Figure 2.2. As can be seen immediately, this system has more components
than the one shown above. However it can still be divided into the same three parts,
input, processor and output. The input consists of the MediaLocator which
represents the address of the device, and the data source, which is constructed using
the MediaLocatorand is the interface to the device. The data is then taken from the
input and sent to the processor. The processor in the system consists of the processor
itself, which takes the data and converts it into the encoding scheme that has been
defined for the system. The other element of the processor is the RTPManager. The
transmission RTPManagertakes the encoded data from the processor and packetizes
it, so that it can be sent over the network. The data is then transmitted over the
network where it is met on the other side by the receiver RTPManager, which takes
the data and depacketizes it, converting it back into a format that can be read by the
player. Once this stage has been completed, the data is passed to the output, consisting
of the player and the speaker (the example shown here is for a voice call, the speaker
could be a monitor or any other sort of output device that the media can be seen or
heard on). The player takes the encoded data and decodes it, then sends it to the output
device so that the receiver can see or hear it.
-
8/21/2019 Project Sampal
15/99
Video Conferencing System Janet Adams
5
Figure 2.2 - System Processing Model
2.1.2 JMF Architecture
The most practical example of real-time media comes from a basic home movie
system. I have shown this system below in Figure 2.3. Imagine someone is making a
home movie, the first thing that they do is record it onto a video tape using a
camcorder. So they are using a capture device the camcorder and recording onto a
data source the video tape.
Once they have made the movie, the next logical thing that they would want to do
would be to watch it. So, thinking of the system processing model, they would need
some sort of processor that would take the data from the data source and convert into
some format that they can see and hear. This processor would be a VCR. When the
data source is placed into the processor, the data is transmitted to the final stage of the
system processing model the output. In this case, the television will be the principle
output device. There will more than likely be speakers on the television that will
transmit the audio part of the media. So below we have a very basic processing model
that many people use every day at home.
-
8/21/2019 Project Sampal
16/99
Video Conferencing System Janet Adams
6
Figure 2.3 - JMF Basic System Model
Yet even though the model shown in Figure 2.3 seems very basic, it still contains the
main elements of the more complicated system process model that is shown above, in
Figure 2.2.
2.1.3 Principle Elements
Data Source
In JMF, a DataSourceis the audio or media source, or possibly a combination of
the two e.g. a webcam with an integrated microphone. It could also be an incoming
stream across a network, for example the internet, or a file. Once the location orprotocol of the data is determined, the data source encapsulates both the media
location, and the protocol and software used to deliver the media. When a
DataSourceis sent to a Player, the Playeris unconcerned about the origin of
the DataSource.
There are two types of DataSources, determined by how the data transfer initiates:
Pull data source: Here the data flow is initiated by the client and the data flow
from the source is controlled by the client.
Push data source: Here the data flow is initiated by the server and the data flow
from the source is controlled by the server.
Several data sources can be combined into one. So if you are capturing a live scene
with two data sources: audio and video, these can be combined for easier control.
-
8/21/2019 Project Sampal
17/99
Video Conferencing System Janet Adams
7
Capture Device
A capture device is the piece of hardware that you would use to capture the data,
which you would connect to the DataSource. Examples would be a microphone or
a webcam. The captured media can then be sent to the Player, converted into
another format or even stored to be used at a later stage.
Like DataSources, capture devices can be either a push or a pull source. If a
capture device is a pull source, then the user controls when to capture the image, if it is
a push source, then the user has no control over when the data is captured, it will be
captured continuously.
Player
As mentioned above, a Player takes a stream of data and renders it to an output
device. A Player can be in any one of a number of states. Usually, a Playerwould
go from one state to the next until it reaches the final state. The reason for these states
is so the data can be prepared before it is played. JMF defines the following six states
for the Player:
Unrealized: In this state, the Player object has just been instantiated and
does not yet know anything about its media.
Realizing:A Playermoves from the unrealized state to the realizing state
when the Player's realize()method is called. In this state, the Playeris
in the process of determining its resource requirements
Realized:Transitioning from the realizing state, the Playercomes into the
realized state. In this state the Playerknows what resources it needs and has
information about the type of media it is to present. It can also provide visual
components and controls, and its connections to other objects in the system are
in place. A player is often created already in this state, using the
createRealizedPlayer()method.
Prefetching: When the prefetch() method is called, a Player moves
from the realized state into the prefetching state. A prefetching Player is
preparing to present its media. During this phase, the Player preloads its
media data, obtains exclusive-use resources, and does whatever else is needed
to play the media data.
-
8/21/2019 Project Sampal
18/99
Video Conferencing System Janet Adams
8
Prefetched:The state where the Playerhas finished prefetching media data
- it's ready to start.
Started: This state is entered when you call the start() method. The
Playeris now ready to present the media data.
Processor
A Processoris a type of Player, which has added control over what processing is
performed on the input media stream. As well as the six aforementioned Player
states, a Processor includes two additional states that occur before the
Processorenters the realizing state but after the unrealized state:
Configuring: A Processorenters the configuring state from the unrealized
state when the configure()method is called. A Processorexists in the
configuring state when it connects to the DataSource, demultiplexes the
input stream, and accesses information about the format of the input data.
Configured: From the configuring state, a Processor moves into the
configured state when it is connected to the DataSourceand the data format
has been determined.
As with a Player, a Processor transitions to the realized state when the
realize()method is called.
DataSink
The DataSinkis a base interface for objects that read media content delivered by a
DataSource and render the media to some destination, typically a file.
Format
A Format object represents an object's exact media format. The format itself carries no
encoding-specific parameters or global-timing information; it describes the format's
encoding name and the type of data the format requires. Format subclasses include,
AudioFormat
VideoFormat
In turn, VideoFormat contains six direct subclasses:
H261Format
H263Format
-
8/21/2019 Project Sampal
19/99
Video Conferencing System Janet Adams
9
IndexedColorFormat
JPEGFormat
RGBFormat
YUVFormat
As will be discussed in more detail later on in this report, the formats that were chosenfor this project were H.263 for the audio and G.723 mono, for the audio.
Manager
A manager, an intermediary object, integrates implementations of key interfaces that
can be used seamlessly with existing classes. JMF offers four managers:
Manager: Use Manager to create Players, Processors,
DataSources, and DataSinks.
PackageManager:This manager maintains a registry of packages that contain
JMF classes, such as custom Players, Processors, DataSources, and
DataSinks.
CaptureDeviceManager: This manager maintains a registry of available
capture devices.
PlugInManager:This manager maintains a registry of available JMF plug-in
processing components.
2.1.4
Common Media Formats
Table 2.1 and Table 2.2 below identify some of the characteristics of common media
formats. When selecting the format for this system, the main consideration was the
bandwidth. This needed to be as low as possible. Obviously, the quality should be as
high as possible. The CPU requirement wasnt really an issue. Each client would be
working off separate computers with separate CPU capabilities, so it wasnt something
that needed to be taken into account in choosing the encoding schemes.
So looking at Table 1, which is the most common video formats, it can be seen that
H.263 is the only one that meets the low bandwidth requirement. The quality is
medium, which is perfectly acceptable for this sort of application. Therefore, this was
the video encoding scheme that was chosen.
-
8/21/2019 Project Sampal
20/99
Video Conferencing System Janet Adams
10
Looking at Table 2, for the audio, it can be seen that there are two formats that meet
the low bandwidth requirement. These are GSM and G.723.1. Of these two, the former
has a low quality while the latter has medium quality. It therefore made more sense to
choose G.723.1. I have highlighted the chosen encoding schemes.
Format Content Type QualityCPU
Requirements
Bandwidth
Requirements
CinepakAVI
QuickTimeMedium Low High
MPEG-
1MPEG High High High
H.261AVI
RTPLow Medium Medium
H.263
QuickTime
AVI
RTP
Medium Medium Low
JPEG
QuickTime
AVI
RTP
High High High
IndeoQuickTime
AVIMedium Medium Medium
Table 2.1 JMF Common Video Formats
-
8/21/2019 Project Sampal
21/99
Video Conferencing System Janet Adams
11
FormatContent
TypeQuality
CPU
Requirements
Bandwidth
Requirements
PCM
AVI
QuickTime
WAV
High Low High
Mu-Law
AVI
QuickTime
WAV
RTP
Low Low High
ADPCM
(DVI,IMA4)
AVI
QuickTime
WAV
RTP
Medium Medium Medium
MPEG-1 MPEG High High High
MPEG
Layer3MPEG High High Medium
GSMWAV
RTPLow Low Low
G.723.1WAV
RTPMedium Medium Low
Table 2.2 JMF Common Audio Formats
As it happens, the schemes that were chosen are ideal for the application as H.263 was
developed for video conferencing applications and is optimised for video where there
is not much movement, and G.723 is typically used for low bit rate speech, such astelephony applications.
2.1.5 Real Time Transport Protocol (RTP) Architecture in JMF
The JMF RTP APIs are designed to work seamlessly with the capture, presentation,
and processing capabilities of JMF. Players and processors are used to present and
manipulate RTP media streams just like any other media content. You can transmit
-
8/21/2019 Project Sampal
22/99
Video Conferencing System Janet Adams
12
media streams that have been captured from a local capture device using a capture
DataSource or that have been stored to a file using a DataSink. Similarly, JMF can be
extended to support additional RTP formats and payloads through the standard plug-
in mechanism. [JavaTM
Media Framework API Guide,
http://java.sun.com/products/java-media/jmf/2.1.1/guide/index.html, November 19,
1999 (April 2005)]
Session Manager
In JMF, a SessionManager is used to coordinate an RTP session. The session
manager keeps track of the session participants and the streams that are being
transmitted. The session manager maintains the state of the session as viewed from the
local participant. The SessionManager interface defines methods that enable an
application to initialize and start participating in a session, remove individual streams
created by the application, and close the entire session.
Session Statistics: The session manager maintains statistics on all of the RTP
and RTCP packets sent and received in the session. The session manager
provides access to two types of global statistics:
o
GlobalReceptionStats: Maintains global reception statistics for the
session.
o
GlobalTransmissionStats: Maintains cumulative transmission
statistics for all local senders.
Statistics for a particular recipient or outgoing stream are available from the
stream:
o ReceptionStats: Maintains source reception statistics for an individual
participant.
o TransmissionStats: Maintains transmission statistics for an individual
send stream.
Session Participants: The session manager keeps track of all of the
participants in a session. Each participant is represented by an instance of a
class that implements the Participantinterface. SessionManagers create a
Participant whenever an RTCP packet arrives that contains a source
description (SDES) with a canonical name (CNAME) that has not been seen
before in the session. A participant can own more than one stream, each of
-
8/21/2019 Project Sampal
23/99
Video Conferencing System Janet Adams
13
which is identified by the synchronization source identifier (SSRC) used by the
source of the stream.
Session Streams: The SessionManager maintains an RTPStream object for
each stream of RTP data packets in the session. There are two types of RTP
streams:o ReceiveStream represents a stream that's being received from a
remote participant.
o SendStream represents a stream of data coming from the
Processoror input DataSource that is being sent over the network.
A ReceiveStream is constructed automatically whenever the session
manager detects a new source of RTP data.
RTP Events
RTP-specific events used to report on the state of the RTP session and streams. To
receive notification of RTP events, you implement the appropriate RTP listener and
register it with the session manager:
SessionListener: Receives notification of changes in the state of the session.
You can implement SessionListenerto receive notification about events
that pertain to the RTP session as a whole, such as the addition of new
participants. There are two types of session-wide events:
o
NewParticipantEvent: Indicates that a new participant has joined the
session.
o
LocalCollisionEvent: Indicates that the participant's synchronization
source is already in use.
SendStreamListener: Receives notification of changes in the state of an RTP
stream that's being transmitted. You can implement SendStreamListener to
receive notification whenever:
o New send streams are created by the local participant.
o
The transfer of data from the DataSource used to create the send stream
has started or stopped.
o
The send stream's format or payload changes.
There are five types of events associated with a SendStream:
o
NewSendStreamEvent: Indicates that a new send stream has been
created by the local participant.
-
8/21/2019 Project Sampal
24/99
Video Conferencing System Janet Adams
14
o
ActiveSendStreamEvent: Indicates that the transfer of data from the
DataSourceused to create the send stream has started.
o InactiveSendStreamEvent: Indicates that the transfer of data from the
DataSourceused to create the send stream has stopped.
o
LocalPayloadChangeEvent: Indicates that the stream's format orpayload has changed.
o StreamClosedEvent: Indicates that the stream has been closed.
ReceiveStreamListener: Receives notification of changes in the state of an
RTP stream that's being received. You can implement
ReceiveStreamListenerto receive notification whenever:
o New receive streams are created.
o
The transfer of data starts or stops.
o The data transfer times out.
o
A previously orphaned ReceiveStreamhas been associated with a
Participant.
o An RTCP APP packet is received.
o The receive stream's format or payload changes.
You can also use this interface to get a handle on the stream and access the
RTP DataSourceso that you can create a MediaHandler.
There are seven types of events associated with a ReceiveStream:
o
NewReceiveStreamEvent: Indicates that the session manager has
created a new receive stream for a newly-detected source.
o
ActiveReceiveStreamEvent: Indicates that the transfer of data has
started.
o
InactiveReceiveStreamEvent: Indicates that the transfer of data has
stopped.
o
TimeoutEvent: Indicates that the data transfer has timed out.
o RemotePayloadChangeEvent: Indicates that the format or payload of
the receive stream has changed.
o
StreamMappedEvent: Indicates that a previously orphaned receive
stream has been associated with a participant.
o
ApplicationEvent: Indicates that an RTCP APP packet has been
received.
-
8/21/2019 Project Sampal
25/99
Video Conferencing System Janet Adams
15
RemoteListener: Receives notification of events or RTP control messages
received from a remote participant. You can implement RemoteListener
to receive notification of events or RTP control messages received from a
remote participant. You might want to implement RemoteListener in an
application used to monitor the session - it enables you to receive RTCPreports and monitor the quality of the session reception without having to
receive data or information on each stream. There are three types of events
associated with a remote participant:
o ReceiverReportEvent: Indicates that an RTP receiver report has been
received.
o SenderReportEvent: Indicates that an RTP sender report has been
received.
o
RemoteCollisionEvent: Indicates that two remote participants are
using the same synchronization source ID (SSRC).
2.1.6 Alternatives to JMF
There was no real alternative to JMF using Java. However, if another programming
language had been used there would have been alternatives available. An example
would be to use C++ programming language in conjunction with the Microsoft Direct
Show API, which includes libraries for rendering media content. There is an open
source project being undertaken at the moment for creating a SIP communicator usingJava and the JMF. Aside from this, there are no real similar applications to this using
Java and this was the reason that Java was chosen.
2.1.7 Summary
As can be seen from the above sections, JMF is a very powerful tool. It is very easy to
work with and the best way to understand it is to use it. It is fair to say that there is a
lot of information, such as forums, help-sites etc. on the World Wide Web regarding
this subject. However, there is not a lot of information on using JMF for projects
similar to this one. Perhaps one of the best features of JMF is that it does not require
one to learn everything about it before using it. With a basic understanding of Java, it
is possible to teach yourself as you go along.
-
8/21/2019 Project Sampal
26/99
Video Conferencing System Janet Adams
16
2.2 Real-Time Transport Protocol
2.2.1 Introduction
The real-time transport protocol (RTP), provides end-to-end delivery services for data
with real-time characteristics, such as interactive audio and video. These services
include payload type identification, sequence numbering, time-stamping and delivery
monitoring. Applications typically run RTP on top of UDP to make use of its
multiplexing and checksum services; both protocols contribute to parts of the
transport protocol functionality. However, RTP may be used with other suitable
underlying network or transport protocols. RTP supports data transfer to multiple
destinations using multicast distribution if provided by the underlying network. [RTP
Technology, http://www.ixiacom.com/library/technology _guides /tg_display.php? key
= rtp, (April 2005)]
Although RTP is used for real-time media, it does not actually ensure that packets are
delivered on time itself, but relies on lower layer services to ensure this, and other
quality-of-service (QOS) guarantees. Each packet has a sequence number and this
allows the receiver to reconstruct the packets into the correct order.
In defining RTP, two closely linked parts will be described:
The real-time transport protocol (RTP), to carry data that has real-time
properties,
The RTP control protocol (RTCP), to monitor the quality of service and to
convey information about the participants in an on-going session.
The diagram that is shown below in Figure 2.4 - RTP and the OSI Model below,
shows how RTP is incorporated into the OSI model. RTP fits into the session layer of
the model, between the application layer and the transport layer. RTP and RTCP work
independent of the underlying Transport Layer and Network Layer protocols.
Information in the RTP header tells the receiver how to reconstruct the data and
describes how the codec bit streams are packetized.
-
8/21/2019 Project Sampal
27/99
Video Conferencing System Janet Adams
17
Figure 2.4 - RTP and the OSI Model
2.2.2 Some RTP Definitions
RTP payload:The data transported by RTP in a packet, for example audio
samples or compressed video data.
RTP packet: A data packet consisting of the fixed RTP header, a possibly
empty list of contributing sources, and the payload data. Some underlying
protocols may require an encapsulation of the RTP packet to be defined.
Typically one packet of the underlying protocol contains a single RTP packet,
but several RTP packets may be contained if permitted by the encapsulation
method.
RTCP packet:A control packet consisting of a fixed header part similar to
that of RTP data packets, followed by structured elements that vary depending
upon the RTCP packet type. Typically, multiple RTCP packets are sent
together as a compound RTCP packet in a single packet of the underlying
protocol; this is enabled by the length field in the fixed header of each RTCP
packet.
Port: The abstraction that transport protocols use to distinguish among
multiple destinations within a given host computer. TCP/IP protocols identify
ports using small positive integers. RTP depends upon the lower-layer protocol
to provide some mechanism such as ports to multiplex the RTP and RTCP
packets of a session.
-
8/21/2019 Project Sampal
28/99
Video Conferencing System Janet Adams
18
Transport address: The combination of a network address and port that
identifies a transport-level endpoint, for example an IP address and a UDP
port. Packets are transmitted from a source transport address to a destination
transport address.
RTP session:The association among a set of participants communicating withRTP. For each participant, the session is defined by a particular pair of
destination transport addresses (one network address plus a port pair for RTP
and RTCP). The destination transport address pair may be common for all
participants, as in the case of IP multicast, or may be different for each, as in
the case of individual unicast network addresses plus a common port pair. In a
multimedia session, each medium is carried in a separate RTP session with its
own RTCP packets. The multiple RTP sessions are distinguished by different
port number pairs and/or different multicast addresses.
Synchronization source (SSRC): The source of a stream of RTP packets,
identified by a 32-bit numeric SSRC identifier carried in the RTP header so as
not to be dependent upon the network address. All packets from a
synchronization source form part of the same timing and sequence number
space, so a receiver groups packets by synchronization source for playback.
Examples of synchronization sources include the sender of a stream of packets
derived from a signal source such as a microphone or a camera, or an RTP
mixer. A synchronization source may change its data format, e.g., audio
encoding, over time. The SSRC identifier is a randomly chosen value meant to
be globally unique within a particular RTP session. A participant need not use
the same SSRC identifier for all the RTP sessions in a multimedia session; the
binding of the SSRC identifiers is provided through RTCP. If a participant
generates multiple streams in one RTP session, for example from separate
video cameras, each must be identified as a different SSRC.
Contributing source (CSRC):A source of a stream of RTP packets that has
contributed to the combined stream produced by an RTP mixer. The mixer
inserts a list of the SSRC identifiers of the sources that contributed to the
generation of a particular packet into the RTP header of that packet. This list is
called the CSRC list. An example application is audio conferencing where a
mixer indicates all the talkers whose speech was combined to produce the
outgoing packet, allowing the receiver to indicate the current talker, even
-
8/21/2019 Project Sampal
29/99
Video Conferencing System Janet Adams
19
though all the audio packets contain the same SSRC identifier (that of the
mixer).
End system: An application that generates the content to be sent in RTP
packets and/or consumes the content of received RTP packets. An end system
can act as one or more synchronization sources in a particular RTP session, buttypically only one.
Mixer:An intermediate system that receives RTP packets from one or more
sources, possibly changes the data format, combines the packets in some
manner and then forwards a new RTP packet. Since the timing among multiple
input sources will not generally be synchronized, the mixer will make timing
adjustments among the streams and generate its own timing for the combined
stream. Thus, all data packets originating from a mixer will be identified as
having the mixer as their synchronization source.
Translator: An intermediate system that forwards RTP packets with their
synchronization source identifier intact. Examples of translators include
devices that convert encodings without mixing, replicators from multicast to
unicast, and application-level filters in firewalls.
Monitor:An application that receives RTCP packets sent by participants in an
RTP session, in particular the reception reports, and estimates the current
quality of service for distribution monitoring, fault diagnosis and long-term
statistics. The monitor function is likely to be built into the application(s)
participating in the session, but may also be a separate application that does not
otherwise participate and does not send or receive the RTP data packets. These
are called third party monitors.
2.2.3 RTP Data Structures
Figure 2.5 below shows the structure of an RTP packet, with explanations of the
different components before it.
V is the Version, which identifies the RTP version.
P is the Padding for the protocols or algorithms that require a packet to be a
specific size. The padding field is a variable field that when set indicates that
the space at the end of the payload is padded with octets to make the packet the
proper size.
-
8/21/2019 Project Sampal
30/99
Video Conferencing System Janet Adams
20
X is the Extension bit, when set, the fixed header is followed by exactly one
header extension with a defined format.
CSRC count contains the number of CSRC identifiers that follow the fixed
header.
M is the Marker, whose interpretation is defined by a profile, is intended toallow significant events such as frame boundaries to be marked in the packet
stream.
Payload type - Identifies the format of the RTP payload and determines its
interpretation by the application. A profile specifies a default static mapping of
payload type codes to payload formats. Additional payload type codes may be
defined dynamically through non-RTP means.
Sequence number increments by one for each RTP data packet sent, and may
be used by the receiver to detect packet loss and to restore packet sequence.
Timestamp reflects the sampling instant of the first octet in the RTP data
packet. The sampling instant must be derived from a clock that increments
monotonically and linearly in time to allow synchronization and jitter
calculations.
SSRC is an identifier that is chosen randomly, with the intent that no two
synchronization sources within the same RTP session have the same SSRC
identifier.
CSRC identifies the contributing sources for the payload contained in this
packet. This is another layer of identification for sessions that have the same
SSRC number, but the data in the stream needs to be differentiated further.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
V P X CC M PT Sequence Number
TimeStamp
Synchronization Source (SSRC) Identifier
Contributing Source (CSRC) identifiers
. . . .
. . . .
Payload Packet
. . . .
Figure 2.5 RTP Packet Header Format
-
8/21/2019 Project Sampal
31/99
Video Conferencing System Janet Adams
21
2.2.4 RTP Control Protocol
The RTP Control Protocol (RTCP) works by transmitting periodically to all
participants in the session, control packets, in much the same manner as data packets
are transmitted. RTCP performs four functions:
provides feedback on the quality of the data distribution, carries a persistent transport-level identifier for an RTP source called the
canonical name or CNAME,
by having each participant send its control packets to all the others, each can
independently observe the number of participants and this number is used to
calculate the rate at which the packets are sent,
conveys minimal session control information, which is an optional function,
RTCP serves as a convenient channel to reach all the participants, but it is not
necessarily expected to support all the control communication requirements of
an application.
Functions 1-3 are mandatory when RTP is used in the IP multicast environment, and
are recommended for all environments. RTP application designers are advised to avoid
mechanisms that can only work in unicast mode and will not scale to larger numbers.
RTCP Packet Format
As mentioned above, RTCP packets are sent periodically to all participants as well as
the data packets. There are a number of types of RTCP packets:
Sender Report
Receiver Report
Source Description
Bye
Application-specific
All participants in a session send RTCP packets. A participant that has recently sent
data packets issues a Sender Report (SR). The sender report contains the total number
of packets and bytes sent as well as information that can be used to synchronize media
streams from different sessions. The structure of the RTCP SR is shown in Figure 2.6
below. It consists of three sections, possibly followed by a fourth profile-specific
extension section if defined.
-
8/21/2019 Project Sampal
32/99
Video Conferencing System Janet Adams
22
The first section, the header, is 8 octets long, with the following fields:
The version (V) is 2 bits and identifies the version of RTP, which is the same
in RTCP packets as in RTP data packets.
The padding (P) is 1 bit, if the padding bit is set, this RTCP packet contains
some additional padding octets at the end which are not part of the controlinformation. The last octet of the padding is a count of how many padding
octets should be ignored.
The reception report count (RC) is 5 bits and represents the number of
reception report blocks contained in this packet.
The packet type (PT) is 8 bits and contains the constant 200 to identify this as
an RTCP SR packet.
The length is 16 bits, the length of this RTCP packet in 32-bit words minus one
including the header and any padding.
The SSRC is 32 bits and is the synchronization source identifier for the
originator of this SR packet.
The second section, the sender information, is 20 octets long and is present in every
sender report packet. It summarizes the data transmissions from this sender and has the
following fields:
The NTP timestamp is 64 bits and indicates the wallclock time when this
report was sent so that it may be used in combination with timestamps returned
in reception reports from other receivers to measure round-trip propagation to
those receivers.
The RTP timestamp is 32 bits and corresponds to the same time as the NTP
timestamp (above), but in the same units and with the same random offset as
the RTP timestamps in data packets.
The sender's packet count is 32 bits and is the total number of RTP data
packets transmitted by the sender since starting transmission up until the time
this SR packet was generated. The count is reset if the sender changes its
SSRC identifier.
The sender's octet count is 32 bits and is the total number of payload octets
(i.e., not including header or padding) transmitted in RTP data packets by the
sender since starting transmission up until the time this SR packet was
generated. The count is reset if the sender changes its SSRC identifier. This
field can be used to estimate the average payload data rate.
-
8/21/2019 Project Sampal
33/99
Video Conferencing System Janet Adams
23
The third section contains zero or more reception report blocks depending on the
number of other sources heard by this sender since the last report. Each reception
report block conveys statistics on the reception of RTP packets from a single
synchronization source. Receivers do not carry over statistics when a source changes
its SSRC identifier due to a collision. These statistics are:
The SSRC_n (source identifier) is 32 bits and is the SSRC identifier of the
source to which the information in this reception report block pertains.
The fraction lost is 8 bits and is the fraction of RTP data packets from source
SSRC_n lost since the previous SR or RR packet was sent, expressed as a fixed
point number with the binary point at the left edge of the field.
The cumulative number of packets lost is 24 bits and is the total number of
RTP data packets from source SSRC_n that have been lost since the beginning
of reception. This number is defined to be the number of packets expected less
the number of packets actually received, where the number of packets received
includes any which are late or duplicates.
The extended highest sequence number received is 32 bits. The low 16 bits
contain the highest sequence number received in an RTP data packet from
source SSRC_n, and the most significant 16 bits extend that sequence number
with the corresponding count of sequence number cycles.
The interarrival jitter is 32 bits and is an estimate of the statistical variance of
the RTP data packet interarrival time, measured in timestamp units and
expressed as an unsigned integer. The interarrival jitter J is defined to be the
mean deviation (smoothed absolute value) of the difference D in packet
spacing at the receiver compared to the sender for a pair of packets.
The last SR timestamp (LSR) is 32 bits and is the middle 32 bits out of 64 in
the NTP timestamp received as part of the most recent RTCP sender report
(SR) packet from source SSRC_n. If no SR has been received yet, the field is
set to zero.
The delay since last SR (DLSR) is 32 bits and is expressed in units of 1/65536
seconds, between receiving the last SR packet from source SSRC_n and
sending this reception report block. If no SR packet has been received yet from
SSRC_n, the DLSR field is set to zero.
-
8/21/2019 Project Sampal
34/99
Video Conferencing System Janet Adams
24
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
V P RC M PT = SR = 200 Length
SSRC of Sender
NTP timestamp, most significant word
NTP timestamp, least significant word
RTP Timestamp
Senders Packet Count
Senders Octet Count
SSRC_1 (SSRC of first source)
Fraction Lost Cumulative number of packets lost
extended highest sequence number received
interarrival jitter
last SR (LSR)
delay since last SR (DLSR)
SSRC_2 (SSRC of second source)
. . . .
profile-specific extensions
Figure 2.6 - RTCP Sender Report Structure
Session participants periodically issue Receiver Reports (RR) for all of the sources
from which they are receiving data packets. A receiver report contains information
about the number of packets lost, the highest sequence number received, and a
timestamp that can be used to estimate the round-trip delay between a sender and the
receiver. The format of the receiver report (RR) packet, as shown in Figure 2.7 below,
is the same as that of the SR packet except that the packet type field contains the
constant 201 and the five words of sender information are omitted (these are the NTP
and RTP timestamps and sender's packet and octet counts). The remaining fields have
the same meaning as for the SR packet. An empty RR packet (RC = 0) is put at thehead of a compound RTCP packet when there is no data transmission or reception to
report.
-
8/21/2019 Project Sampal
35/99
Video Conferencing System Janet Adams
25
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
V P RC M PT = SR = 200 Length
SSRC of Sender
SSRC_1 (SSRC of first source)
Fraction Lost Cumulative number of packets lost
extended highest sequence number received
interarrival jitter
last SR (LSR)
delay since last SR (DLSR)
SSRC_2 (SSRC of second source)
. . . .
profile-specific extensions
Figure 2.7 - RTCP Receiver Report Structure
2.2.5 Alternatives to RTP
Once JMF had been chosen, there was really no better option than the real-time
transport protocol. However, it would be possible to implement a proprietary protocol
using the custom packetizers provided by JMF, along with UDP or TCP. However,
TCP is not suitable for real-time data because of the delays it introduces, due to packet
retransmission and UDP is unsuitable without a higher level features to deal with
packet sequencing and loss. Another alternative to RTP could be RTSP, however this
JMF only limited compatibility for this.
2.2.6 Summary
The Real Time Transport Protocol is a lot more expansive than described above.
However, for what it was used within this project, the detail given above is more than
adequate. It is important to understand the different packet structures that are shown,
as these form the basis by which all data within the system was sent.
-
8/21/2019 Project Sampal
36/99
Video Conferencing System Janet Adams
26
2.3 Audio Encoding Scheme G.723.1
2.3.1 Introduction
As mentioned earlier, the audio encoding scheme which was chosen was G.723.1. This
format is ideal for compressing the audio signal component of multimedia services at a
very low bit rate. In this application it will be used for the audio side of the video
conferencing. The coder that is used was designed to represent speech with a high
quality using a limited amount of complexity. It is not ideal for audio signals other
than speech, for example music, but can be used for them.
The coder involved can operate at one of two bit rates, either 5.3 kbit/s or 6.3 kbit/s.
The higher bit rate has better quality, the lower, whilst still maintaining an adequate
quality also offers more flexibility to the designer. Both rates must be implemented
within in the encoder and the decoder. [3]
Audio signals are encoded by the coder in 30 msec frames; there is also a look ahead
of 7.5 msec. This results in a total delay of 37.5 msec. Any additional delays in the
operation and implementation of the coder can be attributed to:
actual time spent processing the data in the encoder and decoder,
transmission time on the communication link,
additional buffering delay for the multiplexing protocol.
2.3.2 Encoder Principles
The block diagram of the encoder is shown in Figure 2.8 below. As can be seen there
are a number of different blocks, the functions of which are beyond the scope of this
project.
-
8/21/2019 Project Sampal
37/99
Video Conferencing System Janet Adams
27
!
"
#$
%
#
!&
'($#!
)*
%+
,)*
-
)*
)*
)*
&)*
$./
"./
)*)*
)*
)* 0)*
)*)*
1)*
./
2 3
Figure 2.8 - G.723.1 Encoder
2.3.3 Decoder Principles
The block diagram of the decoder is shown below, in Figure 2.9. It is just shown for
diagrammatic purposes, and the functions of the blocks do not need to be understoodfor this project.
Figure 2.9 - G723.1 Decoder
-
8/21/2019 Project Sampal
38/99
Video Conferencing System Janet Adams
28
2.3.4 Alternative Audio Encoding Schemes
As shown in Table 2.2 JMF Common Audio Formats, the only other format with the
required low bandwidth is GSM. The reason that this format was not chosen, is that
G.723 mono is a better quality. This was the only reason for choosing the scheme that
was chosen. ADPCM(DVI, IMA4) and Mu-Law are also suitable for RTP data,however they do not meet the low bandwidth requirements.
2.3.5 Summary
This format was well chosen as it is ideal for the purpose that it will be used for within
this application, which is basically the voice part of the video conferencing. Although
it is possible to go very deep into the workings of the coder and decoder, it is not
necessary for this project. It is sufficient to know the basics of how it works and what
it is suitable to be used for.
2.4 Video Encoding Scheme H.263
2.4.1 Introduction
The H.263 format is ideal for encoding video images without much movement, at low
bit rates. Pictures are sampled at an integer multiple of the video line rate. This
sampling clock and the digital network clock are asynchronous. The transmission
clock is provided externally. The video bit rate may be variable. [4]
2.4.2 Summary of Operation
The diagram in Figure 2.10 shows an H.263 baseline encoder. The algorithms
involved in the operation of this encoder are far beyond the scale of this project. It is
sufficient to know that it exists and is used in the encoding scheme.
-
8/21/2019 Project Sampal
39/99
Video Conferencing System Janet Adams
29
Figure 2.10 - H.263 Baseline Encoder
2.4.3 Alternative Video Encoding Schemes
As shown in Table 2.1 JMF Common Video Formats, the other video formats
supported by RTP include H.261 and JPEG, however neither of these meet the low
bandwidth requirement. At the beginning of the project, it was thought that MPEG
would be used. The reason that this was not chosen is that MPEG does not support
capture from a live video source. It would only support a pre-recorded video or
capture from an MPEG enabled data source. This would not have been suitable for
video calls.
2.4.4 Summary
H.263 can be used for compressing the moving picture component of audio-visual
services at low bit rates. It is ideal for uses in video conferencing as there is not much
movement involved and low bit rates are used. This makes it the ideal encoding
scheme for this application.
2.5
Image Observation
2.5.1 Initial Ideas
Initially, it was thought that some kind of motion detection algorithm would be used to
implement the image observation feature. A number of possibilities were looked into
when researching this prospect, some of which included:
-
8/21/2019 Project Sampal
40/99
Video Conferencing System Janet Adams
30
Motion Estimation: used to predict frames within a video sequence using
previous frames, with the help of motion vectors. The use of motion vectors
mean that only the changes in the frames are sent, as opposed to the whole
frame.
Fixed Size Block Matching: each image frame is divided into a fixed numberof blocks. For each block in the frame, a search is made in the reference frame
over an area of the image for the best matching block, to give the least
prediction error.
Motion Compensation: motion compensation uses blocks from a past frame
to construct a replica of the current frame. For each block in the current frame
a matching block is found in the past frame and if suitable, its motion vector is
substituted for the block during transmission.
After examining the specification for H.263, it was discovered that there were motion
detection and compensation algorithms built into it. This meant that the algorithm did
not have to be coded, it was already there and available to use. RTCP reports were
used to show the byte rate of the video stream, which was then used to implement the
image observation.
2.5.2 The Way it Works
Basically, the H.263 video encoding scheme was used in the implementation of the
image observation. The motion estimation and compensation that is built into the
format was used [1]. This assumes that the pixels within a current picture can be
modelled as a translation of those within a previous picture. Each macroblock is
predicted from the previous frame. The concept of macroblocks is explained below in
Figure 2.11. Each pixel within the macroblock undergoes the same amount of
translational motion, which is represented by two-dimensional motion vectors or
displacement vectors.
-
8/21/2019 Project Sampal
41/99
Video Conferencing System Janet Adams
31
Figure 2.11 - Macroblocks within H.263
The basic idea behind the motion detection is shown in Figure 2.12 below.
Figure 2.12 - Motion Prediction
-
8/21/2019 Project Sampal
42/99
Video Conferencing System Janet Adams
32
The way that the above was used for the image observation is as follows. When a
frame hasnt changed, a reference to a previous frame is sent. Basically, the image
observation feature exploits the temporal redundancy inherent in a video sequence.
The redundancy is larger when a camera is focused on an image that does not contain
a lot of movement. This is the case when a user leaves the shot. This redundancy is
reflected in a reduced RTCP byte rate.
This reference frame is then displayed which requires less byte rate than if a new
frame is sent. The RTCP reports monitor the byte rate of the video stream. If the byte
rate drops, and stays dropped for a certain period of time, then the call is ended. The
procedure to end the call is explained in more detail in section 3.5.
2.6
Multicasting
For the conferencing feature of this application, multicasting was used. All of the
participants within the conference transmit to a multicast address.
2.6.1 Alternatives to Multicasting
Another option that was looked at for the conferencing feature was to just allow all
participants to transmit and receive from and to each other at the same time. This setup
is shown in Figure 2.13. Basically, when the conference button was pressed, one call
would have been able to set up on top of another, so that two calls could take place
simultaneously and that participants would be able to listen for all streams.
-
8/21/2019 Project Sampal
43/99
Video Conferencing System Janet Adams
33
Figure 2.13 - Original Conferencing Plan
As could be imagined, this method would be very cumbersome. It would use a lot of
system resources as there would be an unnecessary amount of streams being sent. It is
impractical for a user to have to transmit their data more than once. This idea was
decided against.
2.6.2 What is Multicasting
Multicasting is when a packet is sent to a host group, which is a set of hosts
identified by a single IP address. A multicast datagram is then delivered to all
members of the destination group [2]. Hosts may join or leave the group at any time as
membership of the group is dynamic. A host can be a member of more than group at a
time and a host does not need to be a member of a group to send datagrams to it. There
are two types of host groups; a permanent host group is one which has a well-known,
administratively assigned IP address, which is permanent. A permanent host group can
have any number of members, even zero. The remainder of the IP addresses are
available for dynamic assignment to the other type of group, which is known as a
transient group. This second type of group only exists as long as it has members.
The forwarding of IP multicast datagrams is handled by multicast routers. When a
datagram is transmitted by the host, it is sent as a local network multicast and will be
delivered to all members of the destination host group. The addresses which are
allocated to the host groups are known as class D IP addresses and range from
-
8/21/2019 Project Sampal
44/99
Video Conferencing System Janet Adams
34
224.0.0.0 to 239.255.255.255. The diagram in Figure 2.14 shows how the data is
distributed to all members of the group.
Figure 2.14 - Multicasting through Router
2.7 Summary
The information contained within this chapter has been an invaluable asset in
developing this application. A firm understanding of all the standards was required
before coding could even begin. JMF placed a lot of restrictions on the standards that
could be used. JMF does provide the ability to implement custom packetizers and
custom encoders, however to so would have been time consuming and unnecessary for
this application.
-
8/21/2019 Project Sampal
45/99
Video Conferencing System Janet Adams
35
Chapter 3
3
Design of the System
3.1
System ArchitectureThe system as it stands consists of two different communication architectures. One is
client to server and the other is client to client. The reason that there are two different
methods is to make the system as efficient as possible. There was the possibility of
using client to server for all communication; however it was felt that this would be
inefficient as the server did not need to be part of a call between two clients, it would
have been an unnecessary use of system resources. For this reason, calls between
clients are peer to peer and all other communication goes through the server.
3.1.1 Client to Server Communication
This architecture is used for all system messages, for the setting up of calls, etc;
basically for everything other than actual calls. There will be one server and there can
be any number of clients connected to that server. The client to server configuration is
shown in Figure 3.1.
Figure 3.1 - Client to Server Communication
-
8/21/2019 Project Sampal
46/99
Video Conferencing System Janet Adams
36
The connections between the server and the clients are bidirectional TCP connections.
It was not necessary to use RTP here as they are not real time connections. RTP is
described in section 2.2 as being ideal for real time communication. The messages that
are sent between the clients and server will include login, logoff, messages to be sent,
calls to be made etc. which are not time dependent. The server plays an integral part in
the system. Basically, all communication between any two clients must first go
through the server. So if a client wishes to call another client, they must send a call
request to the server. The server will then proceed to set up the call between the
clients. The code for this is shown in Appendix 1 in section 7.1. Also included in
Appendix 1 are the code extracts for login request (section 7.2), logoff request (section
7.3), call end request (section 7.4), conference setup request (section 7.5), request to
add a participant to a conference (section 7.6), request to end a conference (section
7.7), request to send a message (section 7.8) and request to receive a message (section7.9).
The purpose of including these code extracts is to show that the server really does
control everything that the clients want to do. It will be the server that will check if the
other party is online and available, and the server that will set up the call. If a client is
unavailable when a message is sent, the server will store the message until they
become available and will then forward it on. Some might ask why a server is
required, why not just let the clients communicate directly. This was basically a design
choice. It was the opinion of the developer that direct client to client communication
for all tasks would mean that the load on the clients will be quite large, which was
unnecessary. If it was up to the clients to do everything, then the system would be
slowed down sufficiently. The server will act as a centre point, where clients can
contact each other. Without the server, the clients would have difficulty in contacting
another client. It was also a lot more efficient to let the server take some of the load
and leave all administration to the server, leaving the clients free to partake in calls,
send messages etc. It also meant that messages could be sent while clients are on calls
because the server can store the message, and messages can also be sent when the
receiving client is offline and stored until their next login, something that would not
have been possible without a server.
-
8/21/2019 Project Sampal
47/99
Video Conferencing System Janet Adams
37
3.1.2 Client to Client Communication
The other form of communication built into the system architecture is direct
communication between two clients. This will only occur at one time, during a call. As
shown in the previous section, the server is required to set up the call. However, once
the call has been set up, the server drops out and the streams are sent directly betweenclients. The client to client architecture is shown in Figure 3.2. This type of
communication exists solely for calls. This is where the real-time transport protocol
discussed in section 2.2 will be employed. Voice and video streams are synchronised.
This is done within the RTP protocol. In section 2.2.4, the section on RTCP, the
canonical name (CNAME) was described as being an identifier for every stream.
When two streams, namely a voice and a video, have the same CNAME, it implies
that they are being sent from the same source and they are automatically synchronized.
This was important for the application as it is a fundamental expectation of a video
conference that the voice and video will be synchronised.
Figure 3.2 Client to Client Communication
Once again, the decision to choose this architecture can be justified. It would have
been possible to let the server remain during the call but it would have been of no
benefit. The decision to remove the server from the call reduced the server complexity
and decreased the system load.
3.2 System Design
When undertaking a software project such as the one described in this report, it is
important to have a good design brief. One of the most effective ways to design such a
system is to create class diagrams. These clearly show the methods that are part of
each class as well as the relationships between classes, and can give an in depth
understanding of the overall system.
-
8/21/2019 Project Sampal
48/99
Video Conferencing System Janet Adams
38
3.2.1 The Server
The first class diagram, shown in Figure 3.3 is one the server and its related classes.
As described in the previous section, the server plays an integral part in the overall
functionality of the system.
Figure 3.3 - Server Class Diagram
As can be seen, the server is the parent class and there are three child classes,
ServerHandle, ServerSideUtilities and ServerSideStorage. There
can only be one server, and the methods within the server are mainly just for the
graphical user interface and will be inherited by the three child classes. The
ServerHandleis the interface a client communicates with for access to server side
resources. The server handle is responsible for the way messages are sent and is
responsible for telling the server what to do when it receives a message, depending on
the type of message received. There are two types of message, a push message and a
-
8/21/2019 Project Sampal
49/99
Video Conferencing System Janet Adams
39
pull message, and both these messages can be sent or received. There are push and
pull links on all clients and on the server. The reason is that normally in a client server
application, only the client can start communication with the server, but by using push
and pull either can initiate communication. A push message is sent by either the client
or server, depending on who initialises communication, and the response to a push
message is a pull message. The person who sends a push will receive back a push, and
the person who sends a pull will receive a pull. An example is shown below, in Figure
3.4. This type of communication would be used in a situation where the user presses a
button on the client side that initializes communication with the server. However,
within this application, there will usually only be one send and one receive per task
(request and confirmation / error), as opposed to two of each, as shown in the diagram.
Figure 3.4 - Example of Push Pull Message Setup
It should be noted that the layout of the diagram above is not the only layout available,
the client and server could be reversed, with the server sending the push and the client
sending the pull. A typical example of this is when a server receives a message which
has to be pushed to its destination; this is the case for UMS delivery. The push and
pull messages are dealt with by the ServerHandle. There can be many
ServerHandlesassociated with one server, as there is one created for every client
that con