project sampal

Upload: gilbert850507

Post on 07-Aug-2018

214 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/21/2019 Project Sampal

    1/99

    Video Conferencing System

    with

    Multimedia Capabilities

    Janet Adams

    April 2005

    BACHOLOR OF ENGINEERING

    IN

    TELECOMMUNICATIONS ENGINEERING

    Supervised by Dr. Derek Molloy

  • 8/21/2019 Project Sampal

    2/99

    Video Conferencing System Janet Adams

    ii

    Acknowledgements

    I would like to thank Dr. Derek Molloy, who supervised me on this project, for his

    enthusiasm and guidance. I would also like to thank Edward Casey, whom I

    collaborated with on certain areas of the project, and his supervisor, Dr. Gabriel

    Muntean, for his support and advice. My thanks also go to my friends Edward Casey,Edel Harrington and Hector Climent for listening to me and guiding me through my

    initial presentation. I would like to dedicate this project to my parents, who have

    supported me throughout all my time in college and especially during this, my final

    year.

    DeclarationI hereby declare that, except where otherwise indicated, this document is entirely my

    own work and has not been submitted in whole or in part to any other university.

    Signed: ...................................................................... Date: ......................................

  • 8/21/2019 Project Sampal

    3/99

    Video Conferencing System Janet Adams

    iii

    Abstract

    This document will describe the development of a video conferencing system with

    multimedia capabilities. The concept of multicasting will be explored as this was used

    in the development of the video conferencing. Other concepts, which were used in the

    development of the system such as Java Media Framework, Real-time TransportProtocol and a number of encoding schemes, will also be investigated.

    The design of the system will explain how each of the features was planned for and

    developed, and will provide the user with an understanding of video conferencing,

    client server communications, motion detection and much more. The implementation

    section is read like a user manual. On completion of this section, the reader should be

    able to make full use of all of the features within the application and should

    understand the depth to which each of the features can be used.

    When this document has been read, the reader will fully understand both how the

    system was developed and how it can be used, as well as understanding the necessary

    technical information to understand how the different features work.

  • 8/21/2019 Project Sampal

    4/99

    Video Conferencing System Janet Adams

    iv

    Table of Contents

    ACKNOWLEDGEMENTS.........................................................................................II

    DECLARATION..........................................................................................................II

    ABSTRACT ................................................................................................................III

    TABLE OF CONTENTS........................................................................................... IV

    TABLE OF FIGURES............................................................................................ VIII

    TABLE OF TABLES...................................................................................................X

    1 INTRODUCTION.................................................................................................1

    1.1 AIM OF THIS PROJECT...................................................................................1

    1.2 CURRENT EXAMPLES OF SIMILAR APPLICATIONS ........................................1

    1.3 EQUIPMENT AND SOFTWARE ........................................................................2

    1.3.1 JBuilder 2005..........................................................................................2

    1.3.2 Logitech Webcam....................................................................................2

    1.3.3 Laptop .....................................................................................................2

    1.3.4 Digital Camcorder ..................................................................................2

    2 TECHNICAL BACKGROUND ..........................................................................3

    2.1 JAVA MEDIA FRAMEWORK...........................................................................3

    2.1.1 Introduction.............................................................................................3

    2.1.2 JMF Architecture ....................................................................................5

    2.1.3 Principle Elements ..................................................................................6

    2.1.4 Common Media Formats.........................................................................9

    2.1.5 Real Time Transport Protocol (RTP) Architecture in JMF..................11

    2.1.6 Alternatives to JMF...............................................................................15

    2.1.7 Summary................................................................................................15

    2.2 REAL-TIME TRANSPORT PROTOCOL...........................................................16

    2.2.1 Introduction...........................................................................................16

    2.2.2 Some RTP Definitions ...........................................................................17

    2.2.3 RTP Data Structures .............................................................................19

    2.2.4 RTP Control Protocol ...........................................................................21

  • 8/21/2019 Project Sampal

    5/99

    Video Conferencing System Janet Adams

    v

    2.2.5 Alternatives to RTP ...............................................................................25

    2.2.6 Summary................................................................................................25

    2.3 AUDIO ENCODING SCHEME G.723.1........................................................26

    2.3.1 Introduction...........................................................................................26

    2.3.2 Encoder Principles................................................................................26

    2.3.3 Decoder Principles................................................................................27

    2.3.4 Alternative Audio Encoding Schemes ...................................................28

    2.3.5 Summary................................................................................................28

    2.4 VIDEO ENCODING SCHEME H.263...........................................................28

    2.4.1 Introduction...........................................................................................28

    2.4.2 Summary of Operation ..........................................................................28

    2.4.3 Alternative Video Encoding Schemes....................................................29

    2.4.4 Summary................................................................................................292.5 IMAGE OBSERVATION ................................................................................29

    2.5.1 Initial Ideas ...........................................................................................29

    2.5.2 The Way it Works ..................................................................................30

    2.6 MULTICASTING ..........................................................................................32

    2.6.1 Alternatives to Multicasting..................................................................32

    2.6.2 What is Multicasting .............................................................................33

    2.7 SUMMARY ..................................................................................................34

    3 DESIGN OF THE SYSTEM ..............................................................................35

    3.1 SYSTEM ARCHITECTURE ............................................................................35

    3.1.1 Client to Server Communication...........................................................35

    3.1.2 Client to Client Communication............................................................37

    3.2 SYSTEM DESIGN .........................................................................................37

    3.2.1 The Server .............................................................................................38

    3.2.2 The Client ..............................................................................................40

    3.3 MESSAGING STRUCTURE............................................................................41

    3.4 CONFERENCING..........................................................................................42

    3.5 IMAGE OBSERVATION ................................................................................46

    3.6 COMMON PROCEDURES WITHIN THE APPLICATION ....................................47

    3.6.1 Login .....................................................................................................47

    3.6.2 Call Setup ..............................................................................................48

    3.6.3 Call Teardown.......................................................................................49

  • 8/21/2019 Project Sampal

    6/99

    Video Conferencing System Janet Adams

    vi

    3.6.4 Logout ...................................................................................................50

    3.7 OTHER FEATURES WITHIN THE APPLICATION.............................................50

    4 IMPLEMENTATION OF THE SYSTEM .......................................................51

    4.1 INTRODUCTION...........................................................................................51

    4.2 LOGGING IN ...............................................................................................51

    4.3 CALLS ........................................................................................................52

    4.3.1 Making a Peer to Peer Call ..................................................................52

    4.3.2 Receiving a Person to Person Call .......................................................55

    4.3.3 Initiating a Conference Call..................................................................56

    4.3.4 Joining a Conference Call ....................................................................57

    4.4 MESSAGES..................................................................................................58

    4.4.1 Sending an MMS Message ....................................................................58

    4.4.2 Receiving an MMS Message .................................................................61

    4.4.3 Videomail Messages..............................................................................63

    4.5 EXTRA FEATURES ......................................................................................64

    4.5.1 Image Observation................................................................................64

    4.5.2 Adaption ................................................................................................64

    4.6 USING THE SERVER ....................................................................................64

    5 RESULTS AND DISCUSSION .........................................................................69

    6 CONCLUSIONS AND FURTHER RESEARCH............................................756.1 THE BENEFITS OF THIS PROJECT.................................................................75

    6.2 THE IMPACT OF THIS PROJECT....................................................................75

    6.3 FUTURE RESEARCH POSSIBILITIES .............................................................76

    6.4 MEETING THE REQUIREMENTS ...................................................................77

    REFERENCES............................................................................................................78

    7 APPENDIX 1 .......................................................................................................79

    7.1 CALL SETUP REQUEST ...............................................................................79

    7.2 LOGIN REQUEST .........................................................................................80

    7.3 LOGOFF REQUEST ......................................................................................81

    7.4 CALL END REQUEST...................................................................................82

    7.5 CONFERENCE SETUP REQUEST ...................................................................83

    7.6 ADD PARTICIPANT TO CONFERENCE REQUEST...........................................84

  • 8/21/2019 Project Sampal

    7/99

    Video Conferencing System Janet Adams

    vii

    7.7 END CONFERENCE REQUEST ......................................................................85

    7.8 SEND MESSAGE REQUEST ...........................................................................87

    7.9 RECEIVE MESSAGE REQUEST .....................................................................87

    8 APPENDIX 2 .......................................................................................................89

    8.1 IMAGE OBSERVATION CODE ......................................................................89

  • 8/21/2019 Project Sampal

    8/99

    Video Conferencing System Janet Adams

    viii

    Table of Figures

    FIGURE 2.1-MEDIA PROCESSING MODEL.......................................................................4

    FIGURE 2.2-SYSTEM PROCESSING MODEL.....................................................................5

    FIGURE

    2.3-JMF

    B

    ASICS

    YSTEMM

    ODEL.......................................................................6

    FIGURE 2.4-RTPAND THE OSIMODEL .......................................................................17

    FIGURE 2.5RTPPACKET HEADER FORMAT ...............................................................20

    FIGURE 2.6-RTCPSENDER REPORT STRUCTURE ........................................................24

    FIGURE 2.7-RTCPRECEIVER REPORT STRUCTURE .....................................................25

    FIGURE 2.8-G.723.1ENCODER ....................................................................................27

    FIGURE 2.9-G723.1DECODER .....................................................................................27

    FIGURE 2.10-H.263BASELINE ENCODER ....................................................................29

    FIGURE 2.11-MACROBLOCKS WITHIN H.263 ...............................................................31

    FIGURE 2.12-MOTION PREDICTION..............................................................................31

    FIGURE 2.13-ORIGINAL CONFERENCING PLAN ............................................................33

    FIGURE 2.14-MULTICASTING THROUGH ROUTER ........................................................34

    FIGURE 3.1-CLIENT TO SERVER COMMUNICATION ......................................................35

    FIGURE 3.2CLIENT TO CLIENT COMMUNICATION ......................................................37

    FIGURE 3.3-SERVER CLASS DIAGRAM.........................................................................38

    FIGURE 3.4-EXAMPLE OF PUSH PULL MESSAGE SETUP ...............................................39

    FIGURE 3.5-CLIENT CLASS DIAGRAM..........................................................................41

    FIGURE 3.6-ALLOCATING A CONFERENCE POSITION ...................................................43

    FIGURE 3.7-MESSAGE SEQUENCE CHART FOR CONFERENCE CALL .............................44

    FIGURE 3.8-CONFERENCING SETUP .............................................................................45FIGURE 3.9-IMAGE OBSERVATION AVERAGES.............................................................46

    FIGURE 3.10-MESSAGESEQUENCE CHART FOR LOGIN ................................................47

    FIGURE 3.11-MESSAGE SEQUENCE CHART FOR CALL SETUP ......................................48

    FIGURE 3.12-MESSAGE SEQUENCE CHART FOR CALL TEARDOWN..............................49

    FIGURE 3.13-MESSAGE SEQUENCE CHART FOR LOGOUT.............................................50

    FIGURE 4.1-LOGIN SCREEN .........................................................................................52

    FIGURE 4.2-HOME SCREEN ..........................................................................................53

    FIGURE 4.3-MAKING A P2PCALL ...............................................................................54

    FIGURE 4.4-DURING A CALL........................................................................................55

    FIGURE 4.5-CALL ACCEPT/REJECT .............................................................................56

    FIGURE 4.6-INITIATING A CONFERENCE CALL .............................................................57

    FIGURE 4.7-CONFERENCE REQUEST ............................................................................58

    FIGURE 4.8-MMSSCREEN ..........................................................................................59

    FIGURE 4.9-ATTACH BUTTON FILE CHOOSER .............................................................60

    FIGURE 4.10-MMSSCREEN READY TO SEND..............................................................61

    FIGURE 4.11UNIFIED INBOX SCREEN.........................................................................62

  • 8/21/2019 Project Sampal

    9/99

    Video Conferencing System Janet Adams

    ix

    FIGURE 4.12-MESSAGE POPUP WINDOW .....................................................................62

    FIGURE 4.13-LEAVE VIDEOMAIL REQUEST .................................................................63

    FIGURE 4.14-VIDEOMAIL COMPOSE ............................................................................63

    FIGURE 4.15VIDEOMAIL POPUP.................................................................................64

    FIGURE 4.16-SERVER LOGIN SCREEN ..........................................................................65

    FIGURE 4.17-SERVER ACTIVITY SCREEN .....................................................................66FIGURE 4.18-SERVER CLIENT STATUS SCREEN ...........................................................66

    FIGURE 4.19-SERVER CLIENT STATUS SCREEN WITH CLIENTS ....................................67

    FIGURE 4.20-SERVER ADMINISTRATION SCREEN ........................................................68

  • 8/21/2019 Project Sampal

    10/99

    Video Conferencing System Janet Adams

    x

    Table of Tables

    TABLE 2.1JMFCOMMON VIDEO FORMATS ...............................................................10

    TABLE 2.2JMFCOMMON AUDIO FORMATS...............................................................11

    TABLE 5.1-TESTING SCENARIOS:LOGIN/LOGOFF ......................................................70

    TABLE 5.2-TESTING SCENARIONS:MAKING A CALL ...................................................71

    TABLE 5.3-TESTING SCENARIOS:SENDING A MESSAGE ..............................................72

    TABLE 5.4-TESTING SCENARIOS:CONFERENCE CALL .................................................73

    TABLE 5.5-OTHER TESTING SCENARIOS......................................................................74

  • 8/21/2019 Project Sampal

    11/99

    Video Conferencing System Janet Adams

    1

    Chapter 1

    1

    Introduction

    All business organisations, for example, office blocks, colleges and shopping centres,

    have telephone systems installed in them. These telephone systems allow features such

    as call forward, call divert, voicemail, free extension dialling to other users within in

    the same network, etc. Another object that is found in almost all of these facilities is

    computers, usually one per user. Therefore, in the majority of establishments, you will

    find that every employee has a telephone handset and a computer. A cost effective and

    space saving idea would be to combine these two everyday utilities so that the

    computer can also be used as a phone. People want their lives and work to be as

    simple and time efficient as possible and one way to achieve this is to have a software

    based telephony system on their computers. Why do they need a physical telephone

    handset when it is possible to attain all the same features on their computers, cutting

    out the expense of the handset?

    1.1 Aim of this Project

    The aim of this project is to develop a video conferencing facility with multicasting

    capability and MMS functionality. The application will be developed in Java making

    use of the Java Media Framework for real time applications. The project will be

    developed in conjunction with Edward Casey, who will develop Videomail and

    adaption features to add to the system.

    1.2 Current Examples of Similar Applications

    There are some examples of software based phone systems available. One example is

    Skype, an internet phone system. This allows users to have voice conversations, free

    of charge, over the internet, provided that the party they are calling is also using the

    Skype service. The disadvantage is that a company employing this system would have

    no control over their users. Another example is Vonage, which offers the same sort of

    service as Skype and hence the same disadvantages.

  • 8/21/2019 Project Sampal

    12/99

    Video Conferencing System Janet Adams

    2

    1.3 Equipment and Software

    1.3.1 JBuilder 2005

    This is the program that was used to code and compile all of the Java code. The reason

    that this program was chosen is that it was available for free and it was very

    straightforward to use. It was simple to use but it did the job. One of the features that

    was very helpful in this program was that it highlighted any common coding errors,

    which saved a lot of time. In other situations, the developer may not have been

    informed of these errors until after compilation.

    1.3.2 Logitech Webcam

    This was used for the testing of the video calls.

    1.3.3

    LaptopTesting was difficult as very few of the features could be tested alone. Almost all

    testing required two computers. For this reason, it was most efficient to use two

    laptops connected to two webcams.

    1.3.4 Digital Camcorder

    The digital camcorder was used for the development of the image observation, as the

    low quality of the webcam introduced to the image, which hampered the calculation of

    an adequate threshold value.

  • 8/21/2019 Project Sampal

    13/99

    Video Conferencing System Janet Adams

    3

    Chapter 2

    2

    Technical Background

    In this chapter, the various standards used in the design of this system will be

    discussed. The standards chosen were based on what was supported by the Java Media

    Framework. There were possibly some more suitable options out there, for example

    with the encoding schemes, but the choice was limited by what was supported by the

    Java Media Framework and the Real-Time Transport Protocol. The standards

    discussed within this chapter were the basic building blocks that this project was built

    on.

    2.1 Java Media Framework

    2.1.1 Introduction

    It is often the case that a Java developer will want to include some real-time media

    within their Java application or applet. Prime examples of such real-time media would

    be audio and video. The Java Media Framework (JMF) was developed to enable this

    to happen. JMF allows the capture, playback, streaming and transcoding of multiple

    media formats. JMF is an extension of the Java platform that provides a powerful

    toolkit from which scalable, cross platform applications can be developed.

    Any data that changes with respect to time can be characterized as real-time media.

    With real-time media, the idea is that you will see it as it happens. So for example, if

    you are partaking in a video conference, you expect that there should not be a

    significant delay between when the other person says something to you, and when you

    hear and see them saying it. Audio clips, MIDI sequences, movie clips, and animations

    are common forms of time-based media. Such media data can be obtained from a

    variety of sources, such as local or network files, cameras, microphones, and live

    broadcasts. Figure 2.1, below, shows a media processing model. There are three main

    elements within the system - the input, the output and the processor. Think of the input

    as where the data comes from, this could be a capture device such as a video camera, a

    file or it could be data that has been received over a network. Before the input can

    reach the output, it has to be formatted so that it can be received correctly. This

    formatting takes place in the processor. A processor can do many things to the data,

  • 8/21/2019 Project Sampal

    14/99

    Video Conferencing System Janet Adams

    4

    some of which include compressing/decompressing, applying effect filters and

    converting into the correct format using the encoding scheme which has been

    specified. Once the data has been correctly formatted by the processor, it is then

    passed on to the output so that the end user can see or hear it. The output could simply

    be a player, such as a speaker or a television, it could save the data to a file or it could

    send it across the network.

    Figure 2.1 - Media Processing Model

    To relate the media processor model shown above to this particular project, let us take

    a look at Figure 2.2. As can be seen immediately, this system has more components

    than the one shown above. However it can still be divided into the same three parts,

    input, processor and output. The input consists of the MediaLocator which

    represents the address of the device, and the data source, which is constructed using

    the MediaLocatorand is the interface to the device. The data is then taken from the

    input and sent to the processor. The processor in the system consists of the processor

    itself, which takes the data and converts it into the encoding scheme that has been

    defined for the system. The other element of the processor is the RTPManager. The

    transmission RTPManagertakes the encoded data from the processor and packetizes

    it, so that it can be sent over the network. The data is then transmitted over the

    network where it is met on the other side by the receiver RTPManager, which takes

    the data and depacketizes it, converting it back into a format that can be read by the

    player. Once this stage has been completed, the data is passed to the output, consisting

    of the player and the speaker (the example shown here is for a voice call, the speaker

    could be a monitor or any other sort of output device that the media can be seen or

    heard on). The player takes the encoded data and decodes it, then sends it to the output

    device so that the receiver can see or hear it.

  • 8/21/2019 Project Sampal

    15/99

    Video Conferencing System Janet Adams

    5

    Figure 2.2 - System Processing Model

    2.1.2 JMF Architecture

    The most practical example of real-time media comes from a basic home movie

    system. I have shown this system below in Figure 2.3. Imagine someone is making a

    home movie, the first thing that they do is record it onto a video tape using a

    camcorder. So they are using a capture device the camcorder and recording onto a

    data source the video tape.

    Once they have made the movie, the next logical thing that they would want to do

    would be to watch it. So, thinking of the system processing model, they would need

    some sort of processor that would take the data from the data source and convert into

    some format that they can see and hear. This processor would be a VCR. When the

    data source is placed into the processor, the data is transmitted to the final stage of the

    system processing model the output. In this case, the television will be the principle

    output device. There will more than likely be speakers on the television that will

    transmit the audio part of the media. So below we have a very basic processing model

    that many people use every day at home.

  • 8/21/2019 Project Sampal

    16/99

    Video Conferencing System Janet Adams

    6

    Figure 2.3 - JMF Basic System Model

    Yet even though the model shown in Figure 2.3 seems very basic, it still contains the

    main elements of the more complicated system process model that is shown above, in

    Figure 2.2.

    2.1.3 Principle Elements

    Data Source

    In JMF, a DataSourceis the audio or media source, or possibly a combination of

    the two e.g. a webcam with an integrated microphone. It could also be an incoming

    stream across a network, for example the internet, or a file. Once the location orprotocol of the data is determined, the data source encapsulates both the media

    location, and the protocol and software used to deliver the media. When a

    DataSourceis sent to a Player, the Playeris unconcerned about the origin of

    the DataSource.

    There are two types of DataSources, determined by how the data transfer initiates:

    Pull data source: Here the data flow is initiated by the client and the data flow

    from the source is controlled by the client.

    Push data source: Here the data flow is initiated by the server and the data flow

    from the source is controlled by the server.

    Several data sources can be combined into one. So if you are capturing a live scene

    with two data sources: audio and video, these can be combined for easier control.

  • 8/21/2019 Project Sampal

    17/99

    Video Conferencing System Janet Adams

    7

    Capture Device

    A capture device is the piece of hardware that you would use to capture the data,

    which you would connect to the DataSource. Examples would be a microphone or

    a webcam. The captured media can then be sent to the Player, converted into

    another format or even stored to be used at a later stage.

    Like DataSources, capture devices can be either a push or a pull source. If a

    capture device is a pull source, then the user controls when to capture the image, if it is

    a push source, then the user has no control over when the data is captured, it will be

    captured continuously.

    Player

    As mentioned above, a Player takes a stream of data and renders it to an output

    device. A Player can be in any one of a number of states. Usually, a Playerwould

    go from one state to the next until it reaches the final state. The reason for these states

    is so the data can be prepared before it is played. JMF defines the following six states

    for the Player:

    Unrealized: In this state, the Player object has just been instantiated and

    does not yet know anything about its media.

    Realizing:A Playermoves from the unrealized state to the realizing state

    when the Player's realize()method is called. In this state, the Playeris

    in the process of determining its resource requirements

    Realized:Transitioning from the realizing state, the Playercomes into the

    realized state. In this state the Playerknows what resources it needs and has

    information about the type of media it is to present. It can also provide visual

    components and controls, and its connections to other objects in the system are

    in place. A player is often created already in this state, using the

    createRealizedPlayer()method.

    Prefetching: When the prefetch() method is called, a Player moves

    from the realized state into the prefetching state. A prefetching Player is

    preparing to present its media. During this phase, the Player preloads its

    media data, obtains exclusive-use resources, and does whatever else is needed

    to play the media data.

  • 8/21/2019 Project Sampal

    18/99

    Video Conferencing System Janet Adams

    8

    Prefetched:The state where the Playerhas finished prefetching media data

    - it's ready to start.

    Started: This state is entered when you call the start() method. The

    Playeris now ready to present the media data.

    Processor

    A Processoris a type of Player, which has added control over what processing is

    performed on the input media stream. As well as the six aforementioned Player

    states, a Processor includes two additional states that occur before the

    Processorenters the realizing state but after the unrealized state:

    Configuring: A Processorenters the configuring state from the unrealized

    state when the configure()method is called. A Processorexists in the

    configuring state when it connects to the DataSource, demultiplexes the

    input stream, and accesses information about the format of the input data.

    Configured: From the configuring state, a Processor moves into the

    configured state when it is connected to the DataSourceand the data format

    has been determined.

    As with a Player, a Processor transitions to the realized state when the

    realize()method is called.

    DataSink

    The DataSinkis a base interface for objects that read media content delivered by a

    DataSource and render the media to some destination, typically a file.

    Format

    A Format object represents an object's exact media format. The format itself carries no

    encoding-specific parameters or global-timing information; it describes the format's

    encoding name and the type of data the format requires. Format subclasses include,

    AudioFormat

    VideoFormat

    In turn, VideoFormat contains six direct subclasses:

    H261Format

    H263Format

  • 8/21/2019 Project Sampal

    19/99

    Video Conferencing System Janet Adams

    9

    IndexedColorFormat

    JPEGFormat

    RGBFormat

    YUVFormat

    As will be discussed in more detail later on in this report, the formats that were chosenfor this project were H.263 for the audio and G.723 mono, for the audio.

    Manager

    A manager, an intermediary object, integrates implementations of key interfaces that

    can be used seamlessly with existing classes. JMF offers four managers:

    Manager: Use Manager to create Players, Processors,

    DataSources, and DataSinks.

    PackageManager:This manager maintains a registry of packages that contain

    JMF classes, such as custom Players, Processors, DataSources, and

    DataSinks.

    CaptureDeviceManager: This manager maintains a registry of available

    capture devices.

    PlugInManager:This manager maintains a registry of available JMF plug-in

    processing components.

    2.1.4

    Common Media Formats

    Table 2.1 and Table 2.2 below identify some of the characteristics of common media

    formats. When selecting the format for this system, the main consideration was the

    bandwidth. This needed to be as low as possible. Obviously, the quality should be as

    high as possible. The CPU requirement wasnt really an issue. Each client would be

    working off separate computers with separate CPU capabilities, so it wasnt something

    that needed to be taken into account in choosing the encoding schemes.

    So looking at Table 1, which is the most common video formats, it can be seen that

    H.263 is the only one that meets the low bandwidth requirement. The quality is

    medium, which is perfectly acceptable for this sort of application. Therefore, this was

    the video encoding scheme that was chosen.

  • 8/21/2019 Project Sampal

    20/99

    Video Conferencing System Janet Adams

    10

    Looking at Table 2, for the audio, it can be seen that there are two formats that meet

    the low bandwidth requirement. These are GSM and G.723.1. Of these two, the former

    has a low quality while the latter has medium quality. It therefore made more sense to

    choose G.723.1. I have highlighted the chosen encoding schemes.

    Format Content Type QualityCPU

    Requirements

    Bandwidth

    Requirements

    CinepakAVI

    QuickTimeMedium Low High

    MPEG-

    1MPEG High High High

    H.261AVI

    RTPLow Medium Medium

    H.263

    QuickTime

    AVI

    RTP

    Medium Medium Low

    JPEG

    QuickTime

    AVI

    RTP

    High High High

    IndeoQuickTime

    AVIMedium Medium Medium

    Table 2.1 JMF Common Video Formats

  • 8/21/2019 Project Sampal

    21/99

    Video Conferencing System Janet Adams

    11

    FormatContent

    TypeQuality

    CPU

    Requirements

    Bandwidth

    Requirements

    PCM

    AVI

    QuickTime

    WAV

    High Low High

    Mu-Law

    AVI

    QuickTime

    WAV

    RTP

    Low Low High

    ADPCM

    (DVI,IMA4)

    AVI

    QuickTime

    WAV

    RTP

    Medium Medium Medium

    MPEG-1 MPEG High High High

    MPEG

    Layer3MPEG High High Medium

    GSMWAV

    RTPLow Low Low

    G.723.1WAV

    RTPMedium Medium Low

    Table 2.2 JMF Common Audio Formats

    As it happens, the schemes that were chosen are ideal for the application as H.263 was

    developed for video conferencing applications and is optimised for video where there

    is not much movement, and G.723 is typically used for low bit rate speech, such astelephony applications.

    2.1.5 Real Time Transport Protocol (RTP) Architecture in JMF

    The JMF RTP APIs are designed to work seamlessly with the capture, presentation,

    and processing capabilities of JMF. Players and processors are used to present and

    manipulate RTP media streams just like any other media content. You can transmit

  • 8/21/2019 Project Sampal

    22/99

    Video Conferencing System Janet Adams

    12

    media streams that have been captured from a local capture device using a capture

    DataSource or that have been stored to a file using a DataSink. Similarly, JMF can be

    extended to support additional RTP formats and payloads through the standard plug-

    in mechanism. [JavaTM

    Media Framework API Guide,

    http://java.sun.com/products/java-media/jmf/2.1.1/guide/index.html, November 19,

    1999 (April 2005)]

    Session Manager

    In JMF, a SessionManager is used to coordinate an RTP session. The session

    manager keeps track of the session participants and the streams that are being

    transmitted. The session manager maintains the state of the session as viewed from the

    local participant. The SessionManager interface defines methods that enable an

    application to initialize and start participating in a session, remove individual streams

    created by the application, and close the entire session.

    Session Statistics: The session manager maintains statistics on all of the RTP

    and RTCP packets sent and received in the session. The session manager

    provides access to two types of global statistics:

    o

    GlobalReceptionStats: Maintains global reception statistics for the

    session.

    o

    GlobalTransmissionStats: Maintains cumulative transmission

    statistics for all local senders.

    Statistics for a particular recipient or outgoing stream are available from the

    stream:

    o ReceptionStats: Maintains source reception statistics for an individual

    participant.

    o TransmissionStats: Maintains transmission statistics for an individual

    send stream.

    Session Participants: The session manager keeps track of all of the

    participants in a session. Each participant is represented by an instance of a

    class that implements the Participantinterface. SessionManagers create a

    Participant whenever an RTCP packet arrives that contains a source

    description (SDES) with a canonical name (CNAME) that has not been seen

    before in the session. A participant can own more than one stream, each of

  • 8/21/2019 Project Sampal

    23/99

    Video Conferencing System Janet Adams

    13

    which is identified by the synchronization source identifier (SSRC) used by the

    source of the stream.

    Session Streams: The SessionManager maintains an RTPStream object for

    each stream of RTP data packets in the session. There are two types of RTP

    streams:o ReceiveStream represents a stream that's being received from a

    remote participant.

    o SendStream represents a stream of data coming from the

    Processoror input DataSource that is being sent over the network.

    A ReceiveStream is constructed automatically whenever the session

    manager detects a new source of RTP data.

    RTP Events

    RTP-specific events used to report on the state of the RTP session and streams. To

    receive notification of RTP events, you implement the appropriate RTP listener and

    register it with the session manager:

    SessionListener: Receives notification of changes in the state of the session.

    You can implement SessionListenerto receive notification about events

    that pertain to the RTP session as a whole, such as the addition of new

    participants. There are two types of session-wide events:

    o

    NewParticipantEvent: Indicates that a new participant has joined the

    session.

    o

    LocalCollisionEvent: Indicates that the participant's synchronization

    source is already in use.

    SendStreamListener: Receives notification of changes in the state of an RTP

    stream that's being transmitted. You can implement SendStreamListener to

    receive notification whenever:

    o New send streams are created by the local participant.

    o

    The transfer of data from the DataSource used to create the send stream

    has started or stopped.

    o

    The send stream's format or payload changes.

    There are five types of events associated with a SendStream:

    o

    NewSendStreamEvent: Indicates that a new send stream has been

    created by the local participant.

  • 8/21/2019 Project Sampal

    24/99

    Video Conferencing System Janet Adams

    14

    o

    ActiveSendStreamEvent: Indicates that the transfer of data from the

    DataSourceused to create the send stream has started.

    o InactiveSendStreamEvent: Indicates that the transfer of data from the

    DataSourceused to create the send stream has stopped.

    o

    LocalPayloadChangeEvent: Indicates that the stream's format orpayload has changed.

    o StreamClosedEvent: Indicates that the stream has been closed.

    ReceiveStreamListener: Receives notification of changes in the state of an

    RTP stream that's being received. You can implement

    ReceiveStreamListenerto receive notification whenever:

    o New receive streams are created.

    o

    The transfer of data starts or stops.

    o The data transfer times out.

    o

    A previously orphaned ReceiveStreamhas been associated with a

    Participant.

    o An RTCP APP packet is received.

    o The receive stream's format or payload changes.

    You can also use this interface to get a handle on the stream and access the

    RTP DataSourceso that you can create a MediaHandler.

    There are seven types of events associated with a ReceiveStream:

    o

    NewReceiveStreamEvent: Indicates that the session manager has

    created a new receive stream for a newly-detected source.

    o

    ActiveReceiveStreamEvent: Indicates that the transfer of data has

    started.

    o

    InactiveReceiveStreamEvent: Indicates that the transfer of data has

    stopped.

    o

    TimeoutEvent: Indicates that the data transfer has timed out.

    o RemotePayloadChangeEvent: Indicates that the format or payload of

    the receive stream has changed.

    o

    StreamMappedEvent: Indicates that a previously orphaned receive

    stream has been associated with a participant.

    o

    ApplicationEvent: Indicates that an RTCP APP packet has been

    received.

  • 8/21/2019 Project Sampal

    25/99

    Video Conferencing System Janet Adams

    15

    RemoteListener: Receives notification of events or RTP control messages

    received from a remote participant. You can implement RemoteListener

    to receive notification of events or RTP control messages received from a

    remote participant. You might want to implement RemoteListener in an

    application used to monitor the session - it enables you to receive RTCPreports and monitor the quality of the session reception without having to

    receive data or information on each stream. There are three types of events

    associated with a remote participant:

    o ReceiverReportEvent: Indicates that an RTP receiver report has been

    received.

    o SenderReportEvent: Indicates that an RTP sender report has been

    received.

    o

    RemoteCollisionEvent: Indicates that two remote participants are

    using the same synchronization source ID (SSRC).

    2.1.6 Alternatives to JMF

    There was no real alternative to JMF using Java. However, if another programming

    language had been used there would have been alternatives available. An example

    would be to use C++ programming language in conjunction with the Microsoft Direct

    Show API, which includes libraries for rendering media content. There is an open

    source project being undertaken at the moment for creating a SIP communicator usingJava and the JMF. Aside from this, there are no real similar applications to this using

    Java and this was the reason that Java was chosen.

    2.1.7 Summary

    As can be seen from the above sections, JMF is a very powerful tool. It is very easy to

    work with and the best way to understand it is to use it. It is fair to say that there is a

    lot of information, such as forums, help-sites etc. on the World Wide Web regarding

    this subject. However, there is not a lot of information on using JMF for projects

    similar to this one. Perhaps one of the best features of JMF is that it does not require

    one to learn everything about it before using it. With a basic understanding of Java, it

    is possible to teach yourself as you go along.

  • 8/21/2019 Project Sampal

    26/99

    Video Conferencing System Janet Adams

    16

    2.2 Real-Time Transport Protocol

    2.2.1 Introduction

    The real-time transport protocol (RTP), provides end-to-end delivery services for data

    with real-time characteristics, such as interactive audio and video. These services

    include payload type identification, sequence numbering, time-stamping and delivery

    monitoring. Applications typically run RTP on top of UDP to make use of its

    multiplexing and checksum services; both protocols contribute to parts of the

    transport protocol functionality. However, RTP may be used with other suitable

    underlying network or transport protocols. RTP supports data transfer to multiple

    destinations using multicast distribution if provided by the underlying network. [RTP

    Technology, http://www.ixiacom.com/library/technology _guides /tg_display.php? key

    = rtp, (April 2005)]

    Although RTP is used for real-time media, it does not actually ensure that packets are

    delivered on time itself, but relies on lower layer services to ensure this, and other

    quality-of-service (QOS) guarantees. Each packet has a sequence number and this

    allows the receiver to reconstruct the packets into the correct order.

    In defining RTP, two closely linked parts will be described:

    The real-time transport protocol (RTP), to carry data that has real-time

    properties,

    The RTP control protocol (RTCP), to monitor the quality of service and to

    convey information about the participants in an on-going session.

    The diagram that is shown below in Figure 2.4 - RTP and the OSI Model below,

    shows how RTP is incorporated into the OSI model. RTP fits into the session layer of

    the model, between the application layer and the transport layer. RTP and RTCP work

    independent of the underlying Transport Layer and Network Layer protocols.

    Information in the RTP header tells the receiver how to reconstruct the data and

    describes how the codec bit streams are packetized.

  • 8/21/2019 Project Sampal

    27/99

    Video Conferencing System Janet Adams

    17

    Figure 2.4 - RTP and the OSI Model

    2.2.2 Some RTP Definitions

    RTP payload:The data transported by RTP in a packet, for example audio

    samples or compressed video data.

    RTP packet: A data packet consisting of the fixed RTP header, a possibly

    empty list of contributing sources, and the payload data. Some underlying

    protocols may require an encapsulation of the RTP packet to be defined.

    Typically one packet of the underlying protocol contains a single RTP packet,

    but several RTP packets may be contained if permitted by the encapsulation

    method.

    RTCP packet:A control packet consisting of a fixed header part similar to

    that of RTP data packets, followed by structured elements that vary depending

    upon the RTCP packet type. Typically, multiple RTCP packets are sent

    together as a compound RTCP packet in a single packet of the underlying

    protocol; this is enabled by the length field in the fixed header of each RTCP

    packet.

    Port: The abstraction that transport protocols use to distinguish among

    multiple destinations within a given host computer. TCP/IP protocols identify

    ports using small positive integers. RTP depends upon the lower-layer protocol

    to provide some mechanism such as ports to multiplex the RTP and RTCP

    packets of a session.

  • 8/21/2019 Project Sampal

    28/99

    Video Conferencing System Janet Adams

    18

    Transport address: The combination of a network address and port that

    identifies a transport-level endpoint, for example an IP address and a UDP

    port. Packets are transmitted from a source transport address to a destination

    transport address.

    RTP session:The association among a set of participants communicating withRTP. For each participant, the session is defined by a particular pair of

    destination transport addresses (one network address plus a port pair for RTP

    and RTCP). The destination transport address pair may be common for all

    participants, as in the case of IP multicast, or may be different for each, as in

    the case of individual unicast network addresses plus a common port pair. In a

    multimedia session, each medium is carried in a separate RTP session with its

    own RTCP packets. The multiple RTP sessions are distinguished by different

    port number pairs and/or different multicast addresses.

    Synchronization source (SSRC): The source of a stream of RTP packets,

    identified by a 32-bit numeric SSRC identifier carried in the RTP header so as

    not to be dependent upon the network address. All packets from a

    synchronization source form part of the same timing and sequence number

    space, so a receiver groups packets by synchronization source for playback.

    Examples of synchronization sources include the sender of a stream of packets

    derived from a signal source such as a microphone or a camera, or an RTP

    mixer. A synchronization source may change its data format, e.g., audio

    encoding, over time. The SSRC identifier is a randomly chosen value meant to

    be globally unique within a particular RTP session. A participant need not use

    the same SSRC identifier for all the RTP sessions in a multimedia session; the

    binding of the SSRC identifiers is provided through RTCP. If a participant

    generates multiple streams in one RTP session, for example from separate

    video cameras, each must be identified as a different SSRC.

    Contributing source (CSRC):A source of a stream of RTP packets that has

    contributed to the combined stream produced by an RTP mixer. The mixer

    inserts a list of the SSRC identifiers of the sources that contributed to the

    generation of a particular packet into the RTP header of that packet. This list is

    called the CSRC list. An example application is audio conferencing where a

    mixer indicates all the talkers whose speech was combined to produce the

    outgoing packet, allowing the receiver to indicate the current talker, even

  • 8/21/2019 Project Sampal

    29/99

    Video Conferencing System Janet Adams

    19

    though all the audio packets contain the same SSRC identifier (that of the

    mixer).

    End system: An application that generates the content to be sent in RTP

    packets and/or consumes the content of received RTP packets. An end system

    can act as one or more synchronization sources in a particular RTP session, buttypically only one.

    Mixer:An intermediate system that receives RTP packets from one or more

    sources, possibly changes the data format, combines the packets in some

    manner and then forwards a new RTP packet. Since the timing among multiple

    input sources will not generally be synchronized, the mixer will make timing

    adjustments among the streams and generate its own timing for the combined

    stream. Thus, all data packets originating from a mixer will be identified as

    having the mixer as their synchronization source.

    Translator: An intermediate system that forwards RTP packets with their

    synchronization source identifier intact. Examples of translators include

    devices that convert encodings without mixing, replicators from multicast to

    unicast, and application-level filters in firewalls.

    Monitor:An application that receives RTCP packets sent by participants in an

    RTP session, in particular the reception reports, and estimates the current

    quality of service for distribution monitoring, fault diagnosis and long-term

    statistics. The monitor function is likely to be built into the application(s)

    participating in the session, but may also be a separate application that does not

    otherwise participate and does not send or receive the RTP data packets. These

    are called third party monitors.

    2.2.3 RTP Data Structures

    Figure 2.5 below shows the structure of an RTP packet, with explanations of the

    different components before it.

    V is the Version, which identifies the RTP version.

    P is the Padding for the protocols or algorithms that require a packet to be a

    specific size. The padding field is a variable field that when set indicates that

    the space at the end of the payload is padded with octets to make the packet the

    proper size.

  • 8/21/2019 Project Sampal

    30/99

    Video Conferencing System Janet Adams

    20

    X is the Extension bit, when set, the fixed header is followed by exactly one

    header extension with a defined format.

    CSRC count contains the number of CSRC identifiers that follow the fixed

    header.

    M is the Marker, whose interpretation is defined by a profile, is intended toallow significant events such as frame boundaries to be marked in the packet

    stream.

    Payload type - Identifies the format of the RTP payload and determines its

    interpretation by the application. A profile specifies a default static mapping of

    payload type codes to payload formats. Additional payload type codes may be

    defined dynamically through non-RTP means.

    Sequence number increments by one for each RTP data packet sent, and may

    be used by the receiver to detect packet loss and to restore packet sequence.

    Timestamp reflects the sampling instant of the first octet in the RTP data

    packet. The sampling instant must be derived from a clock that increments

    monotonically and linearly in time to allow synchronization and jitter

    calculations.

    SSRC is an identifier that is chosen randomly, with the intent that no two

    synchronization sources within the same RTP session have the same SSRC

    identifier.

    CSRC identifies the contributing sources for the payload contained in this

    packet. This is another layer of identification for sessions that have the same

    SSRC number, but the data in the stream needs to be differentiated further.

    0 1 2 3

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

    V P X CC M PT Sequence Number

    TimeStamp

    Synchronization Source (SSRC) Identifier

    Contributing Source (CSRC) identifiers

    . . . .

    . . . .

    Payload Packet

    . . . .

    Figure 2.5 RTP Packet Header Format

  • 8/21/2019 Project Sampal

    31/99

    Video Conferencing System Janet Adams

    21

    2.2.4 RTP Control Protocol

    The RTP Control Protocol (RTCP) works by transmitting periodically to all

    participants in the session, control packets, in much the same manner as data packets

    are transmitted. RTCP performs four functions:

    provides feedback on the quality of the data distribution, carries a persistent transport-level identifier for an RTP source called the

    canonical name or CNAME,

    by having each participant send its control packets to all the others, each can

    independently observe the number of participants and this number is used to

    calculate the rate at which the packets are sent,

    conveys minimal session control information, which is an optional function,

    RTCP serves as a convenient channel to reach all the participants, but it is not

    necessarily expected to support all the control communication requirements of

    an application.

    Functions 1-3 are mandatory when RTP is used in the IP multicast environment, and

    are recommended for all environments. RTP application designers are advised to avoid

    mechanisms that can only work in unicast mode and will not scale to larger numbers.

    RTCP Packet Format

    As mentioned above, RTCP packets are sent periodically to all participants as well as

    the data packets. There are a number of types of RTCP packets:

    Sender Report

    Receiver Report

    Source Description

    Bye

    Application-specific

    All participants in a session send RTCP packets. A participant that has recently sent

    data packets issues a Sender Report (SR). The sender report contains the total number

    of packets and bytes sent as well as information that can be used to synchronize media

    streams from different sessions. The structure of the RTCP SR is shown in Figure 2.6

    below. It consists of three sections, possibly followed by a fourth profile-specific

    extension section if defined.

  • 8/21/2019 Project Sampal

    32/99

    Video Conferencing System Janet Adams

    22

    The first section, the header, is 8 octets long, with the following fields:

    The version (V) is 2 bits and identifies the version of RTP, which is the same

    in RTCP packets as in RTP data packets.

    The padding (P) is 1 bit, if the padding bit is set, this RTCP packet contains

    some additional padding octets at the end which are not part of the controlinformation. The last octet of the padding is a count of how many padding

    octets should be ignored.

    The reception report count (RC) is 5 bits and represents the number of

    reception report blocks contained in this packet.

    The packet type (PT) is 8 bits and contains the constant 200 to identify this as

    an RTCP SR packet.

    The length is 16 bits, the length of this RTCP packet in 32-bit words minus one

    including the header and any padding.

    The SSRC is 32 bits and is the synchronization source identifier for the

    originator of this SR packet.

    The second section, the sender information, is 20 octets long and is present in every

    sender report packet. It summarizes the data transmissions from this sender and has the

    following fields:

    The NTP timestamp is 64 bits and indicates the wallclock time when this

    report was sent so that it may be used in combination with timestamps returned

    in reception reports from other receivers to measure round-trip propagation to

    those receivers.

    The RTP timestamp is 32 bits and corresponds to the same time as the NTP

    timestamp (above), but in the same units and with the same random offset as

    the RTP timestamps in data packets.

    The sender's packet count is 32 bits and is the total number of RTP data

    packets transmitted by the sender since starting transmission up until the time

    this SR packet was generated. The count is reset if the sender changes its

    SSRC identifier.

    The sender's octet count is 32 bits and is the total number of payload octets

    (i.e., not including header or padding) transmitted in RTP data packets by the

    sender since starting transmission up until the time this SR packet was

    generated. The count is reset if the sender changes its SSRC identifier. This

    field can be used to estimate the average payload data rate.

  • 8/21/2019 Project Sampal

    33/99

    Video Conferencing System Janet Adams

    23

    The third section contains zero or more reception report blocks depending on the

    number of other sources heard by this sender since the last report. Each reception

    report block conveys statistics on the reception of RTP packets from a single

    synchronization source. Receivers do not carry over statistics when a source changes

    its SSRC identifier due to a collision. These statistics are:

    The SSRC_n (source identifier) is 32 bits and is the SSRC identifier of the

    source to which the information in this reception report block pertains.

    The fraction lost is 8 bits and is the fraction of RTP data packets from source

    SSRC_n lost since the previous SR or RR packet was sent, expressed as a fixed

    point number with the binary point at the left edge of the field.

    The cumulative number of packets lost is 24 bits and is the total number of

    RTP data packets from source SSRC_n that have been lost since the beginning

    of reception. This number is defined to be the number of packets expected less

    the number of packets actually received, where the number of packets received

    includes any which are late or duplicates.

    The extended highest sequence number received is 32 bits. The low 16 bits

    contain the highest sequence number received in an RTP data packet from

    source SSRC_n, and the most significant 16 bits extend that sequence number

    with the corresponding count of sequence number cycles.

    The interarrival jitter is 32 bits and is an estimate of the statistical variance of

    the RTP data packet interarrival time, measured in timestamp units and

    expressed as an unsigned integer. The interarrival jitter J is defined to be the

    mean deviation (smoothed absolute value) of the difference D in packet

    spacing at the receiver compared to the sender for a pair of packets.

    The last SR timestamp (LSR) is 32 bits and is the middle 32 bits out of 64 in

    the NTP timestamp received as part of the most recent RTCP sender report

    (SR) packet from source SSRC_n. If no SR has been received yet, the field is

    set to zero.

    The delay since last SR (DLSR) is 32 bits and is expressed in units of 1/65536

    seconds, between receiving the last SR packet from source SSRC_n and

    sending this reception report block. If no SR packet has been received yet from

    SSRC_n, the DLSR field is set to zero.

  • 8/21/2019 Project Sampal

    34/99

    Video Conferencing System Janet Adams

    24

    0 1 2 3

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

    V P RC M PT = SR = 200 Length

    SSRC of Sender

    NTP timestamp, most significant word

    NTP timestamp, least significant word

    RTP Timestamp

    Senders Packet Count

    Senders Octet Count

    SSRC_1 (SSRC of first source)

    Fraction Lost Cumulative number of packets lost

    extended highest sequence number received

    interarrival jitter

    last SR (LSR)

    delay since last SR (DLSR)

    SSRC_2 (SSRC of second source)

    . . . .

    profile-specific extensions

    Figure 2.6 - RTCP Sender Report Structure

    Session participants periodically issue Receiver Reports (RR) for all of the sources

    from which they are receiving data packets. A receiver report contains information

    about the number of packets lost, the highest sequence number received, and a

    timestamp that can be used to estimate the round-trip delay between a sender and the

    receiver. The format of the receiver report (RR) packet, as shown in Figure 2.7 below,

    is the same as that of the SR packet except that the packet type field contains the

    constant 201 and the five words of sender information are omitted (these are the NTP

    and RTP timestamps and sender's packet and octet counts). The remaining fields have

    the same meaning as for the SR packet. An empty RR packet (RC = 0) is put at thehead of a compound RTCP packet when there is no data transmission or reception to

    report.

  • 8/21/2019 Project Sampal

    35/99

    Video Conferencing System Janet Adams

    25

    0 1 2 3

    0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1

    V P RC M PT = SR = 200 Length

    SSRC of Sender

    SSRC_1 (SSRC of first source)

    Fraction Lost Cumulative number of packets lost

    extended highest sequence number received

    interarrival jitter

    last SR (LSR)

    delay since last SR (DLSR)

    SSRC_2 (SSRC of second source)

    . . . .

    profile-specific extensions

    Figure 2.7 - RTCP Receiver Report Structure

    2.2.5 Alternatives to RTP

    Once JMF had been chosen, there was really no better option than the real-time

    transport protocol. However, it would be possible to implement a proprietary protocol

    using the custom packetizers provided by JMF, along with UDP or TCP. However,

    TCP is not suitable for real-time data because of the delays it introduces, due to packet

    retransmission and UDP is unsuitable without a higher level features to deal with

    packet sequencing and loss. Another alternative to RTP could be RTSP, however this

    JMF only limited compatibility for this.

    2.2.6 Summary

    The Real Time Transport Protocol is a lot more expansive than described above.

    However, for what it was used within this project, the detail given above is more than

    adequate. It is important to understand the different packet structures that are shown,

    as these form the basis by which all data within the system was sent.

  • 8/21/2019 Project Sampal

    36/99

    Video Conferencing System Janet Adams

    26

    2.3 Audio Encoding Scheme G.723.1

    2.3.1 Introduction

    As mentioned earlier, the audio encoding scheme which was chosen was G.723.1. This

    format is ideal for compressing the audio signal component of multimedia services at a

    very low bit rate. In this application it will be used for the audio side of the video

    conferencing. The coder that is used was designed to represent speech with a high

    quality using a limited amount of complexity. It is not ideal for audio signals other

    than speech, for example music, but can be used for them.

    The coder involved can operate at one of two bit rates, either 5.3 kbit/s or 6.3 kbit/s.

    The higher bit rate has better quality, the lower, whilst still maintaining an adequate

    quality also offers more flexibility to the designer. Both rates must be implemented

    within in the encoder and the decoder. [3]

    Audio signals are encoded by the coder in 30 msec frames; there is also a look ahead

    of 7.5 msec. This results in a total delay of 37.5 msec. Any additional delays in the

    operation and implementation of the coder can be attributed to:

    actual time spent processing the data in the encoder and decoder,

    transmission time on the communication link,

    additional buffering delay for the multiplexing protocol.

    2.3.2 Encoder Principles

    The block diagram of the encoder is shown in Figure 2.8 below. As can be seen there

    are a number of different blocks, the functions of which are beyond the scope of this

    project.

  • 8/21/2019 Project Sampal

    37/99

    Video Conferencing System Janet Adams

    27

    !

    "

    #$

    %

    #

    !&

    '($#!

    )*

    %+

    ,)*

    -

    )*

    )*

    )*

    &)*

    $./

    "./

    )*)*

    )*

    )* 0)*

    )*)*

    1)*

    ./

    2 3

    Figure 2.8 - G.723.1 Encoder

    2.3.3 Decoder Principles

    The block diagram of the decoder is shown below, in Figure 2.9. It is just shown for

    diagrammatic purposes, and the functions of the blocks do not need to be understoodfor this project.

    Figure 2.9 - G723.1 Decoder

  • 8/21/2019 Project Sampal

    38/99

    Video Conferencing System Janet Adams

    28

    2.3.4 Alternative Audio Encoding Schemes

    As shown in Table 2.2 JMF Common Audio Formats, the only other format with the

    required low bandwidth is GSM. The reason that this format was not chosen, is that

    G.723 mono is a better quality. This was the only reason for choosing the scheme that

    was chosen. ADPCM(DVI, IMA4) and Mu-Law are also suitable for RTP data,however they do not meet the low bandwidth requirements.

    2.3.5 Summary

    This format was well chosen as it is ideal for the purpose that it will be used for within

    this application, which is basically the voice part of the video conferencing. Although

    it is possible to go very deep into the workings of the coder and decoder, it is not

    necessary for this project. It is sufficient to know the basics of how it works and what

    it is suitable to be used for.

    2.4 Video Encoding Scheme H.263

    2.4.1 Introduction

    The H.263 format is ideal for encoding video images without much movement, at low

    bit rates. Pictures are sampled at an integer multiple of the video line rate. This

    sampling clock and the digital network clock are asynchronous. The transmission

    clock is provided externally. The video bit rate may be variable. [4]

    2.4.2 Summary of Operation

    The diagram in Figure 2.10 shows an H.263 baseline encoder. The algorithms

    involved in the operation of this encoder are far beyond the scale of this project. It is

    sufficient to know that it exists and is used in the encoding scheme.

  • 8/21/2019 Project Sampal

    39/99

    Video Conferencing System Janet Adams

    29

    Figure 2.10 - H.263 Baseline Encoder

    2.4.3 Alternative Video Encoding Schemes

    As shown in Table 2.1 JMF Common Video Formats, the other video formats

    supported by RTP include H.261 and JPEG, however neither of these meet the low

    bandwidth requirement. At the beginning of the project, it was thought that MPEG

    would be used. The reason that this was not chosen is that MPEG does not support

    capture from a live video source. It would only support a pre-recorded video or

    capture from an MPEG enabled data source. This would not have been suitable for

    video calls.

    2.4.4 Summary

    H.263 can be used for compressing the moving picture component of audio-visual

    services at low bit rates. It is ideal for uses in video conferencing as there is not much

    movement involved and low bit rates are used. This makes it the ideal encoding

    scheme for this application.

    2.5

    Image Observation

    2.5.1 Initial Ideas

    Initially, it was thought that some kind of motion detection algorithm would be used to

    implement the image observation feature. A number of possibilities were looked into

    when researching this prospect, some of which included:

  • 8/21/2019 Project Sampal

    40/99

    Video Conferencing System Janet Adams

    30

    Motion Estimation: used to predict frames within a video sequence using

    previous frames, with the help of motion vectors. The use of motion vectors

    mean that only the changes in the frames are sent, as opposed to the whole

    frame.

    Fixed Size Block Matching: each image frame is divided into a fixed numberof blocks. For each block in the frame, a search is made in the reference frame

    over an area of the image for the best matching block, to give the least

    prediction error.

    Motion Compensation: motion compensation uses blocks from a past frame

    to construct a replica of the current frame. For each block in the current frame

    a matching block is found in the past frame and if suitable, its motion vector is

    substituted for the block during transmission.

    After examining the specification for H.263, it was discovered that there were motion

    detection and compensation algorithms built into it. This meant that the algorithm did

    not have to be coded, it was already there and available to use. RTCP reports were

    used to show the byte rate of the video stream, which was then used to implement the

    image observation.

    2.5.2 The Way it Works

    Basically, the H.263 video encoding scheme was used in the implementation of the

    image observation. The motion estimation and compensation that is built into the

    format was used [1]. This assumes that the pixels within a current picture can be

    modelled as a translation of those within a previous picture. Each macroblock is

    predicted from the previous frame. The concept of macroblocks is explained below in

    Figure 2.11. Each pixel within the macroblock undergoes the same amount of

    translational motion, which is represented by two-dimensional motion vectors or

    displacement vectors.

  • 8/21/2019 Project Sampal

    41/99

    Video Conferencing System Janet Adams

    31

    Figure 2.11 - Macroblocks within H.263

    The basic idea behind the motion detection is shown in Figure 2.12 below.

    Figure 2.12 - Motion Prediction

  • 8/21/2019 Project Sampal

    42/99

    Video Conferencing System Janet Adams

    32

    The way that the above was used for the image observation is as follows. When a

    frame hasnt changed, a reference to a previous frame is sent. Basically, the image

    observation feature exploits the temporal redundancy inherent in a video sequence.

    The redundancy is larger when a camera is focused on an image that does not contain

    a lot of movement. This is the case when a user leaves the shot. This redundancy is

    reflected in a reduced RTCP byte rate.

    This reference frame is then displayed which requires less byte rate than if a new

    frame is sent. The RTCP reports monitor the byte rate of the video stream. If the byte

    rate drops, and stays dropped for a certain period of time, then the call is ended. The

    procedure to end the call is explained in more detail in section 3.5.

    2.6

    Multicasting

    For the conferencing feature of this application, multicasting was used. All of the

    participants within the conference transmit to a multicast address.

    2.6.1 Alternatives to Multicasting

    Another option that was looked at for the conferencing feature was to just allow all

    participants to transmit and receive from and to each other at the same time. This setup

    is shown in Figure 2.13. Basically, when the conference button was pressed, one call

    would have been able to set up on top of another, so that two calls could take place

    simultaneously and that participants would be able to listen for all streams.

  • 8/21/2019 Project Sampal

    43/99

    Video Conferencing System Janet Adams

    33

    Figure 2.13 - Original Conferencing Plan

    As could be imagined, this method would be very cumbersome. It would use a lot of

    system resources as there would be an unnecessary amount of streams being sent. It is

    impractical for a user to have to transmit their data more than once. This idea was

    decided against.

    2.6.2 What is Multicasting

    Multicasting is when a packet is sent to a host group, which is a set of hosts

    identified by a single IP address. A multicast datagram is then delivered to all

    members of the destination group [2]. Hosts may join or leave the group at any time as

    membership of the group is dynamic. A host can be a member of more than group at a

    time and a host does not need to be a member of a group to send datagrams to it. There

    are two types of host groups; a permanent host group is one which has a well-known,

    administratively assigned IP address, which is permanent. A permanent host group can

    have any number of members, even zero. The remainder of the IP addresses are

    available for dynamic assignment to the other type of group, which is known as a

    transient group. This second type of group only exists as long as it has members.

    The forwarding of IP multicast datagrams is handled by multicast routers. When a

    datagram is transmitted by the host, it is sent as a local network multicast and will be

    delivered to all members of the destination host group. The addresses which are

    allocated to the host groups are known as class D IP addresses and range from

  • 8/21/2019 Project Sampal

    44/99

    Video Conferencing System Janet Adams

    34

    224.0.0.0 to 239.255.255.255. The diagram in Figure 2.14 shows how the data is

    distributed to all members of the group.

    Figure 2.14 - Multicasting through Router

    2.7 Summary

    The information contained within this chapter has been an invaluable asset in

    developing this application. A firm understanding of all the standards was required

    before coding could even begin. JMF placed a lot of restrictions on the standards that

    could be used. JMF does provide the ability to implement custom packetizers and

    custom encoders, however to so would have been time consuming and unnecessary for

    this application.

  • 8/21/2019 Project Sampal

    45/99

    Video Conferencing System Janet Adams

    35

    Chapter 3

    3

    Design of the System

    3.1

    System ArchitectureThe system as it stands consists of two different communication architectures. One is

    client to server and the other is client to client. The reason that there are two different

    methods is to make the system as efficient as possible. There was the possibility of

    using client to server for all communication; however it was felt that this would be

    inefficient as the server did not need to be part of a call between two clients, it would

    have been an unnecessary use of system resources. For this reason, calls between

    clients are peer to peer and all other communication goes through the server.

    3.1.1 Client to Server Communication

    This architecture is used for all system messages, for the setting up of calls, etc;

    basically for everything other than actual calls. There will be one server and there can

    be any number of clients connected to that server. The client to server configuration is

    shown in Figure 3.1.

    Figure 3.1 - Client to Server Communication

  • 8/21/2019 Project Sampal

    46/99

    Video Conferencing System Janet Adams

    36

    The connections between the server and the clients are bidirectional TCP connections.

    It was not necessary to use RTP here as they are not real time connections. RTP is

    described in section 2.2 as being ideal for real time communication. The messages that

    are sent between the clients and server will include login, logoff, messages to be sent,

    calls to be made etc. which are not time dependent. The server plays an integral part in

    the system. Basically, all communication between any two clients must first go

    through the server. So if a client wishes to call another client, they must send a call

    request to the server. The server will then proceed to set up the call between the

    clients. The code for this is shown in Appendix 1 in section 7.1. Also included in

    Appendix 1 are the code extracts for login request (section 7.2), logoff request (section

    7.3), call end request (section 7.4), conference setup request (section 7.5), request to

    add a participant to a conference (section 7.6), request to end a conference (section

    7.7), request to send a message (section 7.8) and request to receive a message (section7.9).

    The purpose of including these code extracts is to show that the server really does

    control everything that the clients want to do. It will be the server that will check if the

    other party is online and available, and the server that will set up the call. If a client is

    unavailable when a message is sent, the server will store the message until they

    become available and will then forward it on. Some might ask why a server is

    required, why not just let the clients communicate directly. This was basically a design

    choice. It was the opinion of the developer that direct client to client communication

    for all tasks would mean that the load on the clients will be quite large, which was

    unnecessary. If it was up to the clients to do everything, then the system would be

    slowed down sufficiently. The server will act as a centre point, where clients can

    contact each other. Without the server, the clients would have difficulty in contacting

    another client. It was also a lot more efficient to let the server take some of the load

    and leave all administration to the server, leaving the clients free to partake in calls,

    send messages etc. It also meant that messages could be sent while clients are on calls

    because the server can store the message, and messages can also be sent when the

    receiving client is offline and stored until their next login, something that would not

    have been possible without a server.

  • 8/21/2019 Project Sampal

    47/99

    Video Conferencing System Janet Adams

    37

    3.1.2 Client to Client Communication

    The other form of communication built into the system architecture is direct

    communication between two clients. This will only occur at one time, during a call. As

    shown in the previous section, the server is required to set up the call. However, once

    the call has been set up, the server drops out and the streams are sent directly betweenclients. The client to client architecture is shown in Figure 3.2. This type of

    communication exists solely for calls. This is where the real-time transport protocol

    discussed in section 2.2 will be employed. Voice and video streams are synchronised.

    This is done within the RTP protocol. In section 2.2.4, the section on RTCP, the

    canonical name (CNAME) was described as being an identifier for every stream.

    When two streams, namely a voice and a video, have the same CNAME, it implies

    that they are being sent from the same source and they are automatically synchronized.

    This was important for the application as it is a fundamental expectation of a video

    conference that the voice and video will be synchronised.

    Figure 3.2 Client to Client Communication

    Once again, the decision to choose this architecture can be justified. It would have

    been possible to let the server remain during the call but it would have been of no

    benefit. The decision to remove the server from the call reduced the server complexity

    and decreased the system load.

    3.2 System Design

    When undertaking a software project such as the one described in this report, it is

    important to have a good design brief. One of the most effective ways to design such a

    system is to create class diagrams. These clearly show the methods that are part of

    each class as well as the relationships between classes, and can give an in depth

    understanding of the overall system.

  • 8/21/2019 Project Sampal

    48/99

    Video Conferencing System Janet Adams

    38

    3.2.1 The Server

    The first class diagram, shown in Figure 3.3 is one the server and its related classes.

    As described in the previous section, the server plays an integral part in the overall

    functionality of the system.

    Figure 3.3 - Server Class Diagram

    As can be seen, the server is the parent class and there are three child classes,

    ServerHandle, ServerSideUtilities and ServerSideStorage. There

    can only be one server, and the methods within the server are mainly just for the

    graphical user interface and will be inherited by the three child classes. The

    ServerHandleis the interface a client communicates with for access to server side

    resources. The server handle is responsible for the way messages are sent and is

    responsible for telling the server what to do when it receives a message, depending on

    the type of message received. There are two types of message, a push message and a

  • 8/21/2019 Project Sampal

    49/99

    Video Conferencing System Janet Adams

    39

    pull message, and both these messages can be sent or received. There are push and

    pull links on all clients and on the server. The reason is that normally in a client server

    application, only the client can start communication with the server, but by using push

    and pull either can initiate communication. A push message is sent by either the client

    or server, depending on who initialises communication, and the response to a push

    message is a pull message. The person who sends a push will receive back a push, and

    the person who sends a pull will receive a pull. An example is shown below, in Figure

    3.4. This type of communication would be used in a situation where the user presses a

    button on the client side that initializes communication with the server. However,

    within this application, there will usually only be one send and one receive per task

    (request and confirmation / error), as opposed to two of each, as shown in the diagram.

    Figure 3.4 - Example of Push Pull Message Setup

    It should be noted that the layout of the diagram above is not the only layout available,

    the client and server could be reversed, with the server sending the push and the client

    sending the pull. A typical example of this is when a server receives a message which

    has to be pushed to its destination; this is the case for UMS delivery. The push and

    pull messages are dealt with by the ServerHandle. There can be many

    ServerHandlesassociated with one server, as there is one created for every client

    that con