msc dissertation - yann guillon - webrtc

University of Kent

MSc Dissertation

Remote Support Services Using Peer to PeerCommunication Between Browsers (WebRTC)

Author:Yann Guillon - Ydg2

Project realized with:Quentin Huet

Supervisors:Dr. Frank Wang

Dr. Matteo Migliavacca

January 4, 2015

Contents1 Introduction 3

1.1 Context . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.2 Objectives . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 31.3 Chapter Overview . . . . . . . . . . . . . . . . . . . . . . . . . . 3

2 Background: WebRTC and the real-time web 42.1 The real-time web . . . . . . . . . . . . . . . . . . . . . . . . . . 4

2.1.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 42.1.2 HTTP-based Techniques . . . . . . . . . . . . . . . . . . . 42.1.3 WebSocket . . . . . . . . . . . . . . . . . . . . . . . . . . 6

2.2 WebRTC . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . 62.2.2 Technical aspects . . . . . . . . . . . . . . . . . . . . . . . 82.2.3 Browsers support . . . . . . . . . . . . . . . . . . . . . . . 12

3 Technologies 133.1 Considerations . . . . . . . . . . . . . . . . . . . . . . . . . . . . 13

3.1.1 A future oriented architecture . . . . . . . . . . . . . . . . 133.1.2 An environment built for real time interactions . . . . . . 133.1.3 Scalability and cloud computing . . . . . . . . . . . . . . 14

3.2 The development Stack . . . . . . . . . . . . . . . . . . . . . . . 143.2.1 The MEAN Stack as a base . . . . . . . . . . . . . . . . . 143.2.2 Stack Additions . . . . . . . . . . . . . . . . . . . . . . . . 16

3.3 RTCMultiConnection . . . . . . . . . . . . . . . . . . . . . . . . 173.3.1 Initialization . . . . . . . . . . . . . . . . . . . . . . . . . 173.3.2 Sessions . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173.3.3 Rooms management . . . . . . . . . . . . . . . . . . . . . 183.3.4 Signaling . . . . . . . . . . . . . . . . . . . . . . . . . . . 183.3.5 Media streams gathering . . . . . . . . . . . . . . . . . . . 183.3.6 Error Handling and browser capabilities . . . . . . . . . . 18

4 Project architecture 194.1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 194.2 Application flow and features . . . . . . . . . . . . . . . . . . . . 194.3 Application architecture . . . . . . . . . . . . . . . . . . . . . . . 19

4.3.1 The client side (1) . . . . . . . . . . . . . . . . . . . . . . 204.3.2 The server side (2) . . . . . . . . . . . . . . . . . . . . . . 21

4.4 System architecture . . . . . . . . . . . . . . . . . . . . . . . . . 224.4.1 The development environment . . . . . . . . . . . . . . . 224.4.2 The production environment . . . . . . . . . . . . . . . . 23

1

5 Project implementation 235.1 Key points . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23

5.1.1 Signaling server strategies 1 . . . . . . . . . . . . . . . . . 235.1.2 AngularJS and WebRTC . . . . . . . . . . . . . . . . . . 24

5.2 Teamwork strategy : pair programming . . . . . . . . . . . . . . 255.3 Contribution to the related communities . . . . . . . . . . . . . . 26

5.3.1 Contributions to RTCMulticonnection . . . . . . . . . . . 265.3.2 Contributions to the MEAN Stack . . . . . . . . . . . . . 26

5.4 Future work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 265.4.1 Full cloud integration . . . . . . . . . . . . . . . . . . . . 265.4.2 Screen interactions . . . . . . . . . . . . . . . . . . . . . . 265.4.3 Files transfer . . . . . . . . . . . . . . . . . . . . . . . . . 26

6 Performance and security concerns 276.1 Bandwidth and media quality . . . . . . . . . . . . . . . . . . . . 27

6.1.1 Context of the experiment . . . . . . . . . . . . . . . . . . 276.1.2 Testing environment . . . . . . . . . . . . . . . . . . . . . 276.1.3 Process followed . . . . . . . . . . . . . . . . . . . . . . . 276.1.4 Results and analysis . . . . . . . . . . . . . . . . . . . . . 27

6.2 Security concerns . . . . . . . . . . . . . . . . . . . . . . . . . . . 296.2.1 Data security . . . . . . . . . . . . . . . . . . . . . . . . . 296.2.2 Signaling server concerns . . . . . . . . . . . . . . . . . . 29

7 Conclusion 30

8 Annexes 308.1 Project resources . . . . . . . . . . . . . . . . . . . . . . . . . . . 30

1This section is related to many concepts of RTCMulticonnection, please see section 3.3

2

1 Introduction1.1 ContextSince the very early ages of its creation, internet have been mainly used tocommunicate. During the late 1970s, the introduction of Bulletin Board Systemsallowed people to send direct messages to each other. A decade later, with thetremendous expansion of the World Wide Web, blogging platforms and socialnetworks started to appear. But users wanted more : in the early 2000s, withappearance of the Web 2.0 and the explosion of social media, video through theinternet was made popular by software like Skype, iChat, or MSN Messenger.

The considerably big amount of data to carry with the transmission of video-based content has always been a problem for software companies. The mostefficient and less broadband consuming solution found by software creators wasto use peer-to-peer direct communication between the software of each partici-pant. Each company created its own proprietary or open-source software to doso.

In in concern of unification and standardization, the WebRTC project wascreated by Google in 2013. Its aim is to provide a set of technologies to allowflawless Peer-to-Peer communication directly between web browsers without theneed of any external plug-in. It includes audio, video, screen-sharing, and datatransfer. It focuses on the refashioning of the real-time communication world,targeting businesses and individuals.

1.2 ObjectivesThe aim of this dissertation is to provide a wide overview of the WebRTCproject. As the WebRTC technology is still under a developing process, onlythe currently implemented features will be analysed. This work is based on theproject developed on the side of this dissertation, which is a full featured webconferencing website using WebRTC. This dissertation will also provide bench-marks and tests to assess the current usability and performances of WebRTC(see section 8.1).

1.3 Chapter OverviewThis dissertation is based on multiple structured parts. The following chapterwill review the literature about the WebRTC standard and the real-time web.

In chapter 3, an overview of the technologies we used in our project, and theconsiderations about the technical choices we made will be provided.

Chapter 4 is about the architecture of the project, including the developmentenvironment and the internal structure of the project.

Chapter 5 covers the implementation of the project, including project man-agement, issues encountered and successes during the realization of the project.

Chapter 6 analyses the performance of the WebRTC standards using theprovided implementation.

3

2 Background: WebRTC and the real-time web2.1 The real-time web2.1.1 Overview

The real-time web consists of sending data though the web as soon as it is madeavailable by its publisher(s). The evolution of the on-line media is the mainreason of the expansion of this technology since the early 1990s: everybodywants to know everything before everyone.

Ajax

The purpose of Ajax (Asynchronous JavaScript + XML) is to send asyn-chronous requests to any HTTP server. Ajax is implemented using JavaScript,a language that is automatically interpreted by every recent browser at the load-ing of any HTTP page. The JavaScript layer of the website submits the requestin a defined format, and fetches the response from the HTTP server without theneed to reload the page. It can be triggered by the user, via JavaScript events,such as mouse or keyboard interactions. Ajax is the most famous way to buildHTTP-based real-time interactions. [5]

2.1.2 HTTP-based Techniques

Since its invention, the HTTP protocol has been built on a request-responsestructure, and was not designed for real-time interactions. The browser submitsa request to a HTTP server, which submits a response as a result of the requestmade. [5]

Figure 1: Classic HTTP interaction [7]

Polling

Polling uses the timer feature of JavaScript to submit Ajax requests periodi-cally. If a change has been made on the server side, the server will respond with

4

the new data, but if no change is made, a response is still given, resulting in anexceed of resources and bandwidth usage from both the client and the server[7].

Figure 2: A HTTP real-time technique : Polling [7]

Long Polling

Long polling is a technique where the browser submits a request to the serverwhich waits for a change on its side to respond to the request. This technique ismuch more efficient than classic polling (2.1.2), but still implies resources wasteon the server side : as the connection is hold, any extra requests from the sameclients will require a new connection [7].

Figure 3: Another HTTP real-time technique : Long polling [7]

Streaming

HTTP Streaming is very close to Long Polling. The only difference is thatstreaming doesn’t close the connection when data is available from the server.HTTP Streaming is considered as the most effective HTTP real-time technique,due to its responsiveness, but the data is mono-directional: only the server cansend messages to the client [5].

5

Issues with the HTTP-based real-time web

Each technique has its own drawbacks, which can be grouped in sub-categories.

• Resources and bandwidth problems are found in every techniques.This is a consequence of the alteration of the HTTP model. On a serverpoint of view, either many requests are made (i.e. polling, see 2.1.2), orconnections have to be held by the server (i.e. long polling, see 2.1.2 andstreaming, see 2.1.2), which can pose a scalability problem for applicationsthat have to handle a very large amount of users, as the server charge fora single user can be multiplied depending on the technique used.

• Timing Delays are also to consider, as no direct connection is establishedbetween the clients and the server. Delays are to consider each time thatthe browser needs to request the server for new information [5].

2.1.3 WebSocket

WebSocket is a protocol that allows to get a bidirectional real-time communi-cation from a single connection, between a web browser and a server. Web-Socket is directly integrated to every recent web browsers. WebSocket is astandard within the HTML5 project, managed by the World Wide Web Con-sortium (W3C) authority. As the HTTP-based techniques, the JavaScript layerloaded in the web browsers handle the connection and communication with theserver. On the server side, a custom server is to be used. It needs to implementthe WebSocket RFC, and the Berkeley Sockets protocol [7]. WebSocket solvesall the problems raised by HTTP solutions for real-time interactions (2.1.2).

Figure 4: A simplified WebSocket communication [5]

2.2 WebRTC2.2.1 Overview

WebRTC is a project that aims to regroup a set of standards, in order to pro-vide real-time web communication without the need of any external plug-in or

6

software. WebRTC is designed to be directly included into software by theirdevelopers. Web browsers are directly focused by the WebRTC project, as theyare the main and most common way to access the internet, and are present onevery internet-connected devices, from phones to computers.

All the media communications in WebRTC are peer-to-peer: each deviceestablishes a direct connection with every device it is communicating with. Inthe case of multiple participants, a mesh network 1 is automatically created, inorder to optimize the bandwidth usage, and provide the best quality as possible.

The WebRTC standards are not only about web browsers : they also of-fer the opportunity to communicate with any kind of software supporting theWebRTC standards. Therefore, it can extended to telephony (SIP) or propri-etary communication software, in the business sector in example. This makesWebRTC adaptive and expendable.

We will mainly focus on web browsers, as they are the key to the wide es-tablishment of WebRTC. WebRTC in browsers can be seen as a built-in feature.The WebRTC functionalities can be accessed through a JavaScript API, whichis defined by a specific part of the WebRTC standards [6]. This API provides alink to each user media resource, and is accessible through the JavaScript engineof browsers. HTML5 is used to display each media, using the video and audiostandards it features.

Historic and contributors

Google first acquiered On2 in 2010, a video codec company that has devel-oped the VP series of codecs (vp8), made for real time media, and which werenot proprietary. Google went on and acquired Global IP Solutions (GIPS) thesame year, a company that was providing a proprietary implementation of peer-to-peer connection between browsers. Google wanted to make this technologyavailable freely and openly to everybody: the WebRTC project was born. Thisproject became quickly supported by two working groups : by the World WideWeb Consortium (W3C) for the establishment of the standards, and by theInternet Engineering Task Force (IETF) for the development of the protocols,known as the RTCWEB project.[2]

Early adopters use cases

The two largest internet companies by revenue in 2014 already own soft-ware running on WebRTC. The first example is Google, with its communicationplatform Google Hangouts, that uses a proprietary plug-in running WebRTC,to handle the audio, video and screen-sharing during conversations. Amazonalso integrates WebRTC to its customer service, called Mayday, that featuresthe standard WebRTC audio and video, but also remote help via screen-sharingand control [9].

1A Mesh Network is a topology of network where all the hosts are connected without acentral hierarchy, by building a mesh structure.

7

2.2.2 Technical aspects

Overview of a standard WebRTC application

The simplified architecture of a basic WebRTC application can be split intwo distinct parts.

The first noticeable part is the client-side. It is composed of the web browser,with its WebRTC bindings and its JavaScript layer. This last one is commonlyadded by developers depending on the behaviour of their application, to addscript-able actions executed from the browser of users. In a WebRTC applica-tion, this layer has two extra roles. It requires to, of course, include the callsto the WebRTC API, provided by the browsers, mixed with the business logicof the client-side behaviour of the program. It’s also the role of this JavaScriptlayer to make sure that it communicates with the second mandatory part whenimplementing a WebRTC application: the signaling server.

It is used to make a link between the different clients connected to the mainserver. Signals are data messages sent through the network containing variouscontrol information, that will be detailed in section 2.2.2.

The connection between the signaling server and the JavaScript layer canbe made using any kind of real-time web protocol (see section 2.1). The reasonwhy a classic HTTP request is not suitable, is that the information to receiveabout other participants, is not predictable [6].

Figure 5: Real-Time scheme of WebRTC [4]

Sessions management

Each time participants want to create a conference, a WebRTC session iscreated. All the data to be transferred will be exchanged within the scope ofthis session.

The current WebRTC communication model allows two types of configura-tion. In the triangular situation, only a single web server instance is used for all

8

the participants. The second configuration, known as trapezoidal, allows multi-ple signaling server instances to handle different clients, using messages known as“jingles” to share the session information between the signaling servers. Thosetwo architecture are very similar, and in a matter of simplicity, we will onlyfocus on the triangular architecture [4].

Establishment

The establishment of a WebRTC session is done in five steps.

• Like any other Website, the first step is to fetch the public files fromthe server. It includes the JavaScript layer mentioned in section 2.2.1,alongside the HTML and CSS code. The JavaScript is loaded into thebrowser, and the connection between the JavaScript and the signalingserver is instantiated.

• Once the client is connected to the signaling server, messages followingthe Session Description Protocol (SDP) are exchanged with the server todefine the set of technologies to use during the session (2.2.2).

• The initiation of the session using the signaling server is then finished.The browser itself now needs to establish a peer-to-peer link with theother browsers involved. To do so, the ICE Hole punching technique isused. Described in section 2.2.2, this technique uses a distant server toinstantiate a bidirectional access to each client, regardless of the topologyof their network.

• The next step is a handshake about the secure pass-phrases to transfer, ifan eventual secured transfer (using SRDP, see 2.2.2) occurs.

• Finally, the media session is opened, using either RDP or SRDP, and thecommunication can start [4].

9

Figure 6: Session establishment of a triangular WebRTC communication [4]

Ice servers and hole punching

With the massive use of Network Address Translators (NAT) and proxies byinternet users, the peer-to-peer connection of users without any ports openingon their routers is difficult, as users tend to be hidden in either a local sub-network or the restriction imposed by firewalls. In order to solve this problem,a technique called hole punching have been created. Following the InteractiveConnectivity Establishment(ICE) protocol, this technique consists of multiplesteps. The first one is to get as much information as possible on the transportof packets from the browser to the internet. This means get as much as IPaddresses on the transit of a packet. To do so, a request is made to a thirdparty server, known as a Session Traversal Utilities for NAT (STUN) server.It is used to get the public IP address of the caller. The local IP address ofa client can be fetched by using local network utilities on the machine. TheIPs addresses are then sent through the signaling server, in a process defined asExchange Candidates. [1]. Each ICE agent (integrated in browsers) then effec-tuates connectivity checks with the provided IPs, and selects the most suitablefor the transfer.

Traversal Using Relays around NAT (TURN) servers can also be used, incases of unreachable peers. TURN servers act as relays, by assigning a relay IPaddress publicly accessible to each party, that re-routes the data to the creatorof the said IP address. In case of an ICE communication, the IP address is

10

added to the list of IP candidates [1].

Signaling

The purpose of the signaling server is too handle messages called signals, usedto coordinate and handle control messages during a WebRTC communication[2]. The way that signaling is implemented is up to the developer, as it isnot standardized. In every web standards, only the elements that need to bestandardized are [4]. Signaling stays mandatory, but the way to implement itdoes not need a specific structure in order to get a WebRTC implementation tofunction.

The signaling server handles four principal tasks :

• The negotiation of media and session settings regroups the sessionestablishment calls. Using the SDP protocol, each party will share mediaformats and their codecs, and information about the bandwidth and IPaddresses to be used during the Ice Hole Punching phase (2.2.2) [4].

• The credentials management of users is also handled by the signalingserver. It needs to identify each party, and remember them, using storagemethods in either memory or a database system.

• The management of the current session regroups all the user actionsthat can be done during a session. This can be the addition of new par-ticipants, new media shared or added, hangouts and drops of users duringthe session, and the session termination.

• The Glare Resolution allows to remove the risk of simultaneous con-nections to a server. It establishes a master/slave relation between twonodes, using timeslots, and therefore avoid conflicts [4].

Main Protocols

WebRTC is an aggregation of either old, heavily used protocols though thetime, and very recent and effective protocols [4].

Video codec

The video is handled by the VP8 Codec, in a WebM video container. Ac-quired by Google, it used to be proprietary but Google released its source codein 2010. It is known to be less efficient on nearly every domain than its widelyused concurrent, x.264, but this last one is proprietary, and therefore could notbe included to the WebRTC project in a matter of source code transparencyand openness. [2]

Media Transport

11

The negotiation of technologies to used between participants is done usingthe Session Description Protocol. Established in 1999, this protocol describesthe media used during a session. The first party shares the media it supports,and the other party chooses the media depending on those its own supportinglist. [3]. This protocol have been widely used in other protocols, such as VOIPor SIP.

After the negotiation, the transport of media stream follows the Real-TimeTransport Protocol (RTC) standard. It handles the transport of real-time data(i.e. media streams) using the Internet protocol (IP). It defines a packet formatto send those data. There is no verification that the data arrived to the propersender, but as media streams are transferred, data loss is negligible [8]. Asecured version of this protocol, SRTP, can also be used to add an SSL layer forthe transfer of media [4].

2.2.3 Browsers support

WebRTC is not yet supported by every popular web browsers available today.In fact, we can note huge gaps between the features supported by each browser.They can be classified in three groups.

Good participation: Google and Mozilla, respectively with their browsersChrome and Firefox, are doing great efforts to include each WebRTC specifi-cation to their browsers, and that since the beginning of the WebRTC project.Both browsers almost include the whole specification, with a slight advantagefor Google Chrome, that can be understood by the fact that WebRTC is carriedby Google since the appearing of this technology.

Medium participation: In March 2014, Opera Software released a versionof its browser including the most basic functionalities of WebRTC, which showsan interest in this technology. As it is recent, just simple audio and video callsare possible.

No participation yet: Apple and Microsoft did not show any interest in theWebRTC project yet. Microsoft is envisaging the integration of Object Real-time Communications (ORTC) to Internet Explorer, a WebRTC variant, whichis an object oriented protocol, where all the data transfer is automated and sentusing peer-to-peer communication only.

12

Figure 7: WebRTC support at mid August 2014Source: iswebrtcreadyyet.com

3 Technologies3.1 Considerations3.1.1 A future oriented architecture

WebRTC is still at its early development stages, and will surely evolve withtechnologies that will become popular in an early future. In a matter of accuracy,we focused on the most recent technologies currently available that are becomingpopular. We also took a look at the key projects already developed for WebRTC,to decide what technologies will be the most adequate to our implementation.The guidelines of the WebRTC and RTCWEB projects have also been taken incount.

3.1.2 An environment built for real time interactions

In order to function flawlessly and quickly, WebRTC signaling strategies requirethe most efficient and responsive web techniques available today. Our concernwas to have an environment that is really built for live interactions, without theneed to add external plug-ins or layers, in order to offer easy maintainabilityand ease of use.

13

3.1.3 Scalability and cloud computing

PaaS1 could services are becoming a standard in order to maintain performanceand scalability. It preserves the whole “ web application” concept : web sitesare no longer seen as an imbrication of scripts, but as an application as a whole.Still from a future oriented point of view, its a real must have for current webapplications. Most PaaS solutions also offers metrics and analysis of the perfor-mance of the application, useful for further experiments.

3.2 The development Stack3.2.1 The MEAN Stack as a base

Presentation

The MEAN stack is an open-source development stack, featuring the verylatest web technologies (MongoDB as a database, Express as back-end webframework, AngularJS as a front-end web framework and NodeJS as a back-endtechnology). This stack provides very useful features for developers. First, itwraps all the technologies cited before in a coherent, ready to use environment,as every component is linked to the others. This results in a considerable gainof time, as developers do not need to build their own stack, which can be a verylong task if the technologies are not perfectly mastered by them. Second, it offerspowerful Command Line Interface (CLI) tools for monitoring and debugging,such as the uptime of each component. Third, it includes many productivity-based tools, such as the live reload of both the server and the web page if anychanges have been detected in the code.

Two different but very similar implementations of the MEAN stack areavailable : the “MEAN.io” version, which is owned by Linovate Corp. and“MEAN.js”, maintained by the original creator of the MEAN stack.

The MEAN.io stack was more adapted to this project, as it is more pop-ular and structured, and provides a package manager. The fact that a wholecompany is behind the project also ensures more evolutions and quicker com-munication on eventual bugs or requests.

The MEAN stack has two drawbacks for the implementation of our project.First, the learning curve is significant. The fact that every components of thelibrary are integrated to each other, and the difficulty to understand the wholearchitecture without a good knowledge of each element, makes it very hard tomaster it at the early stages of the development. Second, it doesn’t provide aweb-socket architecture by default - which is essential for our signaling server -but it can be easily added to the express framework.

NodeJS and Express

1Platform as a service : provides a stack and a platform as a service

14

Node.js is a full-JavaScript server-side development platform to build webapplications. Its main advantage and the reason of its creation is its speed.As JavaScript is the only universal language for web scripting on the clientside of websites, the competition between the modern web browsers (mainlyGoogle Chrome, Mozilla Firefox and Microsoft Internet Explorer) have beenvery intense, resulting in more and more powerful virtual machines to load webpages faster. Node.js uses the fastest virtual machine currently available whichbelongs to Google Chrome, and uses it to execute server-side code. But it’snot only about the speed : Node.js also provides lightweight applications, asthe structure of Node.js is minimalistic. Therefore it improves scalability fordemanding applications.

Node.js features a famous package manager, Node Package Manager (NPM),which allows developers to easily publish their library. The community can theninclude those extensions easily within their project.

Express.js, as an Application framework, adds a complete structure toNode.jsto build web applications quickly and easily. To follow the Node.js guidelines onlightness and performance, its structure is very minimalistic. It follows the clas-sic Model-View-Controller design pattern (MVC). Express.js is combined witha Node.js entity manager, mongoose, to provide an abstraction to the persistentdata to be stored (also known as “model” in the MVC architecture). It also fea-tures an authentication utility, passport, which handles user management, anda built-in social media third party login with multiple famous authenticationproviders included, such as Facebook, Twitter, and Google.

Figure 8: The MVC Architecture, as used by ExpressJSSource: yalantis.com

AngularJS

AngularJS is a front-end Javascript framework owned by Google. Its mainfeature is to extend the HTML basic syntax allowing a cleaner and more main-

15

tainable code. Its regroups the following features : dynamic data binding di-rectly inside the HTML, the possibility to repeat DOM elements, helpers tobuild HTML forms and finally, setting up HTML modules to be reused. It’sstructure is based on three main components:

• The view can be seen as the structure of what is going to be displayedon the web page. It regroups HTML and CSS, and uses data bindings todisplay the dynamic data.

• The controller uses the Angular API to fetch the dynamic data fromthe server, and then makes it available to the view.

• The model allows to structure the data that is going to be exchanged onthe web page, either from the server through the controller, or from userinput. It also features validation helpers, in example if a form needs to befilled on the page.

Figure 9: The basic architecture of AngularJS

MongoDB

MongoDB is a Database Management System (DBMS) following the NoSQLguidelines. The purpose of NoSQL database systems is to avoid the table-system scheme of the classic SQL systems, and directly store data as objectswith dynamic schemes. The structure of MongoDB is designed to handle bigvolumes of data, and to be integrated in scalable environments. For this purpose,it features : automated load-balancing using an horizontal scaling, an advancedindexation system and a file-storage that splits elements in parts, that can thenbe stored seamlessly within different MongoDB instances.

3.2.2 Stack Additions

Socket.io and WebSockets

16

The MEAN Stack does not provide tools for real-time interactions by de-fault. We chose Socket.io, a JavaScript library to establish real-time event-driven communications using either HTTP techniques (2.1.2) or WebSockets(2.1.3). Socket.io is made of two distinct libraries, for the client and the server.Its main strength is its ease of use, as minimal code needs to be written for eachside. Its event-driven architecture is also simplistic yet powerful. Events areidentified by a name, and contains a message. A comportment just have to bedefined by each party in case of the reception of each event.

3.3 RTCMultiConnectionRTCMulticonnection is a JavaScript library to create WebRTC applicationswith multiple participants. It allows people to share video, sound, screen anddata though WebRTC channels. Unlike the other libraries available, it onlywraps the WebRTC calls at a slightly low level, allowing developers to havemore flexibility and create complex WebRTC experiments. It’s creator, MuazKhan, is very reactive to questions and bugs solving.

3.3.1 Initialization

The initialization of the RTCMulticonnection library takes place in 4 simplesteps:

• Initialization the Session Description Protocol (SDP) and its constraints• Initialization the ICE servers, which can be configured using the ice-Servers() method

• Setting of the bandwidth options• Initializing and test the WebSocket communication

3.3.2 Sessions

The RTCMultiConnection library allows to create local media-sessions for eachuser. Those contain information about each media shared by the user. A simplecall is required to define the media.

1 connection . session ({2 audio : true , // audio only connection3 video : true , // audio + video connection ( disables audio only)4 screen : true , // screensharing5 data: true // direct data transfer , i.e files6 });

When the session is initialized locally, the initialization steps of WebRTCare automatically made, and the defined streams (audio, video, screen-sharing)are dynamically added or removed using the WebRTC API calls :

• RTCPeerConnection.addStream()• RTCPeerConnection.removeStream()

17

3.3.3 Rooms management

3.3.4 Signaling

The RTCMulticonnection library has its own signaling server strategy. It createsmultiple channels (identified by a unique hash) for each data stream. The de-fault implementation of signaling uses the Web-socket JavaScript library. How-ever, the whole signaling method can be changed by overriding the openSignal-ingChannel() method. Therefore, any kind of real time web service that handlesthe Pub/Sub mechanism can be used to handle signaling.

1 connection . openSignalingChannel = function ( config ) {2 var channel = config . channel || defaultChannel ;34 // connection to the main channel , expressing the need

to join a sub - channel have to be done here56 var socket ;78 // The sub channel has to be opened here , using the

channel variable as an identifier , and needs to bestored locally (in the socket varaible in thisexample )

910 socket .send = function ( message ) {11 // socket logic to send and recieve messages has to be

implemented here12 };13 };

The openSignalingChannel() function is called every time that RTCMulti-connection needs to open a channel. The config variable passed to the functioncontains the channel to open. The first step is to connect to the main channel,to tell the server that a sub channel connection is required. Then connect tothe sub channel and finally override the send method on the socket, as send inbind by RTCMulticonnection.

3.3.5 Media streams gathering

RTCMulticonnection wraps the different streams in a single method onStream()when a stream is received. It contains the stream itself, and various informationabout originator and the type of stream (audio, video, screen-share or data).It also contains the type of stream (local or remote). This stream can be bindto an HTML5 video element using the createObjectURL() JavaScript systemmethod.

3.3.6 Error Handling and browser capabilities

Many methods are available to simplify the complex error handling of WebRTC.All the following methods can be overridden to display errors to fit any userinterface.

18

Errors

• The onError() method handles RTCMulticonnection errors.• The onMediaError() method handles the general media errors of WebRTC

media connections, such as stream errors and connection-related issues.

Browser Capabilities

• The DetectRTC object detects if media devices, such as the web camera,microphone, or eventual external plug-ins are present.

• The connection.UA object contains various information about the browser.

4 Project architecture4.1 OverviewThe purpose of the project we implemented, called OpenHangouts, is to providea working demonstration of the main features of WebRTC. It uses new andupcoming web technologies, that are more likely to be used when the WebRTCproject will be fully completed and used by the community.

4.2 Application flow and featuresOpenHangouts is a simple and easy to use conferencing website. After an au-thentication step, using local or social media authentication, users are able tojoin conferencing rooms. The user that wants to create the room simply copiesthe room identifier given when the room is opened, and gives it to the otherparticipants. People who joined the room can then communicate using audioand video.

A role of presenter also exists : the presenter can share what is displayed onits screen (or a part of it) with all the other participants in the conference. Thisrole is first assigned to the creator of the session, and the current presenter canthen send this role with any other participant.

4.3 Application architectureThe purpose of this section is to explain how the different technologies listed insection 3 interact with each other in the scope of the project, and to highlightthe main developing parts that we made.

19

Figure 10: Application Architecture of the OpenHangouts project

4.3.1 The client side (1)

AngularJS

AngularJS (see 3.2.1), as a MVC font-end framework handles the wholeclient-side display and interactions of the client-side of the project. It simplyuses a URL routing system, to display the content dynamically depending on theURL requested by the browser. If an URL matches a defined route, a targetedcontroller will be called. It will generate a view with dynamic content dependingon the data sent by the server via an API.

WebRTC as an AngularJS service

Angular JS (3.2.1) is modular : non specific modules can be added as ser-vices, that are query-able by the controllers for data or actions. Bindings ofthe RTCMulticonnection library are integrated to a service. It regroups all theactions that can be executed on the WebRTCMulticonnection library, and acallback system that notifies controllers if a change is made directly within theservice, in example the connection of a new participant.

20

4.3.2 The server side (2)

The server side is composed of two distinct servers. First, Express.js handlesthe HTTP requests made by the browser, following the MVC architecture onthe server side. Its controllers interact with Mongoose, using two main datamodels (User and Channel) using classic CRUD actions. Second, the socket.ioserver handles the role of signaling server. It includes sub-channels, knownas name-spaces, to handle its role. It also uses the channel controller fromExpress.js to access the Channel model.

The signaling management is split in two different groups of name-spaces.Socket.io offers the opportunity to create namespaces easily, which have theirown logic, and can be seen as “rooms”. It can be easily implemented using asimple URL parameter.

From a server point of view :1 io.of( DEFAULT_URL + channel ) // channel is defined in the

calling URL , ’of ’ defines the want to use a namespace2 .on(’connection ’, function ( socket ) {34 // The Signaling logic for this specific namespace can be

defined here5 }

From a client point of view :1 namespace_socket = io. connect ( DEFAULT_URL + channel ); // channel

repesents the namespace to join.23 // actions can be bound on the namespace_socket variable

When a client connects to the signaling server, it first connects to the mainsocket.io server, and submits multiple requests to access different namespaces.

• The first name-spaces to join concerns classical WebRTC signaling actions,they are created dynamically by the RTCMulticonnection library.

• For each session, one other name-space is joined : created by us, thisnamespace is used to manage extra actions, in our case the presentermanagement.

21

4.4 System architecture

Figure 11: System Architecture of the OpenHangouts project

The system architecture can be split in 2 different parts.

4.4.1 The development environment

The development environment is minimalistic. A NodeJS instance is run locally,with a local MongoDB database connected to it. For testing purposes, multiplebrowsers are run locally. A GitHub server is also present. Its main purposeis to handle version updates of the source code using a git server. GitHub isalso a way to expose the source code to the community, which can review bugs,propose ameliorations, or modify the code from any point of the project. Themean stack provide debugging tools and useful plug-ins such as code syntaxvalidators (JSLint, CSSLint).

22

4.4.2 The production environment

The production environment is located on Modulus, a PaaS cloud platformmade for NodeJS hosting. It feature multiple interesting features

• Easy code deployment using a single command on a command lineinterface. The code is compressed and then fully deployed on a cloudinstance.

• On demand scaling is also available. Computing instances can easilybe added using the Web interface provided.

• Metrics and statistics about the running project are available from theweb interface.

A MongoDB shared server is also run on the modulus cloud. It is shared byall instances and also offer scaling opportunities.

5 Project implementation5.1 Key points5.1.1 Signaling server strategies 1

ProblemWe wanted to offer the possibility to build custom features on top of the

signaling server basic capacities. As seen in section 2.2.2, the RTCMulticon-nection library uses the openSignalingChannel() method each time that a userneeds to connect to a specific signaling channel, to get or send control messages.The openSignalingChannel() method uses a channel stored in the configura-tion of RTCMulticonnection, and no control is given over the channels that areopened, as RTCMulticonnection uses this method for its own signaling strategy.

ResolutionWe went through multiple tests before finding the most appropriate solution

to this problem.

• First, we tried to add the extra methods directly in the sockets opened bythe openSignalingChannel method, but it resulted in the extra methodsbeing called multiple times on the clients. As multiple channels are createdby RTCMulticonnection through the same function, the socket functionwas bind for each channel opened.

• We tried setting the config.channel variable in the connection of WebRTC.The problem is this channel variable can be rewritten an any time byRTCMulticonnection, blocking every possibility of overriding it ourselves.

• To finally solve this problem, we created a method that is analogue toopenSignalingChannel(), but where we have full control of the behaviourof the signal sent through it, and that is called at the creation of a room.

1This section is related to many concepts of RTCMulticonnection, please see section 3.3

23

1 var openCustomActionsChannel = function (channel , connection)

2 {3 io. connect ( SIGNALING_SERVER ).emit(’new -custom - channel ’,

{ // a request for a custom channel is sent here4 channel : channel ,5 sender : Global .user._id6 });78 self. channels [ channel ] = channel ; // channels are

stored locally for deletion purposes910 self. mysock = io. connect ( SIGNALING_SERVER + channel , {

custom : true }); // the custom socket is opened , onthe right sub channel

1112 // all the custom signaling actions can be made here ,

here is the example of setting a new presenterthrough the signaling server

13 self. mysock . setPresenter = function (id){14 self. mysock .emit(’setPresenter ’, {15 id: id16 });17 };1819 // ...

5.1.2 AngularJS and WebRTC

Problem

We wanted to get the data on the front-side of our application to be au-tomatically updated each time that a modification is made in our angularJSservice. AngularJS controllers use the $scope variable to store and update thevariables in real time. The problem is that the queries to our AngularJS serviceare only single-sided: if any variable changes within the service, the controllercouldn’t know that any modification have been done, as the controller was onlyactive when the user was active.

Resolution

To solve this problem, we used the observer design pattern of JavaScript.The AngularJS controller simply says that he wants to observe the modificationsthrough a registerObserverCallback() method. Each time that a modificationwithin the service itself is made, a notifyObserver() method is triggered and thecontroller is notified of the changes.

12 // In the service , a list of observers is created . a function

NotifyObservers is also created , to apply a callback functionto each observer .

3 var observerCallbacks = [];

24

4 var notifyObservers = function (){5 angular . forEach ( observerCallbacks , function ( callback ){6 callback ();7 });8 };910 // The registerObserverCallback is exposed to the caller11 return {12 registerObserverCallback : function ( callback ){13 observerCallbacks .push( callback );14 },15 };1617 // In the controller , the registering method can be called , and the

data , in that case the stream of the screen currently shared ,is updated automatically .

18 WebRTC . registerObserverCallback ( function (){19 $scope . screen = WebRTC . getScreen ();20 $scope . $apply ();21 });2223 // The $scope variable is bind by to the elements in the view ,

allowing elements to update automatically24 // The <div > element containing the screen stream only appears if

the $scope . stream is set (ng -show directive ), and uses the$scope . screen . stream as a source .

25 <div class ="screen - container container ">26 <video ng -show=" screen " ng -src="{{ screen . stream }}"

autoplay ></video >27 </div >

5.2 Teamwork strategy : pair programmingMy co-worker and I tried to figure out what was the most effective way toproduce the best quality software possible in the time given for the realizationof this project. We came to the conclusion that pair programming was themost suitable for us. We can talk about pair programming when two developerswork on the same feature of a project behind a single screen. Our choice wasmotivated by the following reasons:

• Our background knowledge in the technologies required at the beginningof the project was small.

• We were sure to gain skills at the same time and always have the samelevel of knowledge of the project.

• It also helped producing a better quality code, as one can see a better wayto do a certain task.

• The complexity of various parts of the project really required two devel-opers working on them at the same time, especially the integration of theAngularJS WebRTC service (5.1.2), and the signaling server (5.1.1).

At the beginning and the end of the project, we took the liberty to do smalland non blocking tasks separately, such as small features to improve the usability

25

of the project, the design and its integration, and finally the error handling.

5.3 Contribution to the related communities5.3.1 Contributions to RTCMulticonnection

We had the opportunity of discovering a major issue in the RTCMulticonnectionlibrary. When the sharing of a media is stopped, using the built-in button inGoogle Chrome, the stream continued to flow and if the user decided to openthis stream again, there was an SDP (see 2.2.2) error saying that the connectionwas called in a wrong state. After multiple e-mail exchanges with Muaz Khan,the creator of the library, we fixed the bug together in the last version of itslibrary (2.4).

OpenHangouts will also be released soon on the RTCMulticonnection web-site in the demonstration category.

5.3.2 Contributions to the MEAN Stack

The mean stack offers the possibility to create code packages to be installeddirectly in any MEAN code structure, using the built in package manager orNode Package Manager (see 3.2.1). We decided to release the code created asone of those packages, allowing to include our whole conferencing website toany kind of application with a minimalistic code integration to do.

5.4 Future work5.4.1 Full cloud integration

Only one step is missing to manage a full cloud integration and scaling in ourapplication. Horizontal scaling balances the load on different exact same in-stances of the application, using a shared database. As those two first pointsare already included in our project, the only thing to add is to be able to shareWebSocket server connection between multiple instance, using, in example aRedis1 shared database.

5.4.2 Screen interactions

Many interactions could be added to the screen sharing features. It could bedrawings on screen that are shared via a WebRTC data connection, or a point-and-click-system allowing participants to have more interactions with the pre-senter.

5.4.3 Files transfer

Peer-to-peer files transfer could also be included easily, as a raw data connectionis present by default in WebRTC. This feature is quite basic and do not considera real breakthrough unlike screen sharing and video sharing.

1Redis is a “key-value” data store mainly used for scalability purposes

26

6 Performance and security concerns6.1 Bandwidth and media quality6.1.1 Context of the experiment

Internet has to be seen as a public resource : the quality of the internet connec-tion of users is subject to many factors, such as the coverage of their country,the type of connection used (Modem, ADSL, optical fibre, 3G, etc.), and thevarious parameters within their area (quality of the line, distance of the inter-net relay). We decided to run an experiment to test the quality of WebRTCconnections by limiting the bandwidth allowed to it, and compare it with theperformance and the usability of WebRTC services.

6.1.2 Testing environment

• The computer used for the test was a Lenovo Thinkpad W520 runningtwo distinct Google Chrome instances.

• The internet connection used was a fast optical fibre connection, with abandwidth highly superior of the bandwidth needs of each test.

• The OpenHangouts project was run distantly at the following url:https://openhangouts.uni.me

6.1.3 Process followed

The bandwidth have been limited using a call to the RTCMulticonnecton libraryto manually restrict the bandwidth allocated for each type of stream.

For each test, we reached the point where the media would not be displayed.We then increased the bandwidth until the usability changed slightly.

To benchmark the video displayed, we modified a benchmarking code1 usingGoogle Chrome system functions, allowing to get an accurate frame-rate andresolution of the video to analyse.

6.1.4 Results and analysis

Video and sound sharing

1http://webrtchacks.com/mirror-framerate/

27

Figure 12: Results of bandwidth tests on the video and audio streams

WebRTC allows to have a decent conversation quality even at very low up-load rates, but as seen on almost every result, a high activity on the web cameraconducts the frames per second to drop to a very low value.

An automatic resizing of the video is also to note, even if it is minor. TheWebRTC algorithms favour high resolution over the frames per seconds avail-able, as the resolution of the video could have been reduced a lot more, implyinga gain of frames per second.

Finally, we can see that bandwidths around 300kb/s ensures a very goodquality of conversation, even with high activity on the web camera.

Screen sharing

28

Figure 13: Results of bandwidth tests on the screen sharing stream

When it comes to screen sharing, the WebRTC algorithms do not offer tolower the resolution to possibly get a higher frames per seconds count. At everytest realized, a high activity on the screen resulted in a dramatic drop of framesper second, as the screen to be shared was using a high resolution display. Theonly notable element is the increase of the frames per second depending on thebandwidth when no big changes were made on the screen (in example mousemoves).

6.2 Security concerns6.2.1 Data security

The first concern is about data and media security. Most of the already im-plemented WebRTC applications mainly use unprotected data transfer usingRTP, in a matter of speed or simplicity, that can result in the interception ofthose media. It is also a problem of transparency between the user and theactions realized by WebRTC. Once the media sharing have been accepted bya user, it is remembered by the browser and malicious record of data can beoperated. A client may also be infected by a virus or malicious software thatcan load on top of WebRTC and fetch the shared media from all the parties inthe conference. To counter those problems, the use of secure protocols for thetransmission of the data itself can be set-up by the developers, by encryptingthe streams themselves by both parties.

6.2.2 Signaling server concerns

The signaling server implementation is also a key point when it comes to secu-rity. As this part is not standardized, and is important because it deals with

29

reasonably sensible data, such as IP addresses, sessions, and various informationabout the browsers of each participant. The use of SSL communications for thereal-time interactions can add a layer of security, but the developers still needto be careful and be fully aware of the data that is manipulated.

7 ConclusionThe implementation of this project have been very profitable for us. Firstit allowed us to learn a lot about various cutting-edge technologies alongsideWebRTC. We also went through a complete process of research and implemen-tation, to end with a finished product usable on-line. Working on very recentconcepts is far from easy. Our teamwork strategies helped us to finish theproject in time despite the complexity of some parts of the implementation.This project will hopefully be reused by the WebRTC community, especially bydevelopers wanting to either include WebRTC features to the MEAN stack, orto an AngluarJS project.

WebRTC on its own is a very promising technology that considers a majorbreakthrough in the world of web conferencing, by its simplicity of use and itssupport by the most trusted authorities on the web. The performances shownby WebRTC are good but can be improved on large conferences, with, in ex-ample, the possibility to use relay servers to manage the data transfer for userswith a limited bandwidth. The possible success of WebRTC is based first on thewant of the major web browsers owners to implement it. Second, the actors ofthe web scene can also decide if WebRTC can become the only real time com-munication standard on the web, by using it in popular projects. The supportof ORTC by Internet Explorer shows that the real time communication on theweb may be divided in two sides, or WebRTC may evolve in a different way.

Word count excluding figures : 7395

8 Annexes8.1 Project resourcesA fully working demonstration of the project is available online.

https://openhangouts.uni.me

The project have also been submitted to the webRTC-experiment project,and will soon be available in the list of the public demonstrations.

https://www.webrtc-experiment.com/RTCMultiConnection/

30

Developers can also get full access to the code using our public repository.It can be used to analyse the structure of the project, modify it or report bugsor wanted ameliorations.

http://github.com/overlox/openhangouts

A brief documentation for installation and use can also be found at :

http://github.com/overlox/openhangouts/wiki

References[1] D. K. Bryan Ford, Pyda Srisuresh, Peer-to-peer communication

across network address translators.http://www.brynosaurus.com/pub/net/p2pnat/, feb 2005.

[2] S. Dutton, Getting started with webrtc.http://www.html5rocks.com/en/tutorials/webrtc/basics/, jul 2012.

[3] M. Handley and V. Jacobson, RFC 2327: SDP: Session descriptionprotocol, Apr. 1998. Status: PROPOSED STANDARD.

[4] A. B. Johnston, B. D., and D. C., WebRTC: APIs and RTCWEBProtocols of the HTML5 Real-Time Web, Digital Codex LLC, USA, 2012.

[5] J. Lengstorf, Realtime web apps HTML5 WebSocket, Pusher, and theweb’s next big thing, Apress Computer Bookshops distributor, Berkeley,Calif. Birmingham, 2013.

[6] A. Narayanan, C. Jennings, B. A., and B. D., WebRTC 1.0:Real-time communication between browsers, W3C working draft, W3C,Sept. 2013. http://www.w3.org/TR/2013/WD-webrtc-20130910/.

[7] R. Rai, Socket.io Real-time Web Application Development, Packt Pub,Birmingham, 2013.

[8] H. Schulzrinne, S. L. Casner, R. Frederick, and V. Jacobson,RTP: A transport protocol for real-time applications. IETF Request forComments: RFC 3550, jul 2003.

[9] L.-L. Tsahi, Seven reasons for webrtc server-side media processing.http://networkfuel.dialogic.com/webrtc-whitepaper, apr 2014.

31

msc dissertation - yann guillon - webrtc

Documents

peer communication

webrtc project

web browsers

allowflawless peer

peer direct communication

software companies

application architecture

development environment