building digital project rooms for web meetings...microphone. we use the websocket api6 for...

Building digital project rooms for web meetings

Laurent Denoue, Scott Carter, Andreas Girgensohn, Matthew CooperFX Palo Alto Laboratory

3174 Porter Dr.Palo Alto, CA, 94304 USA

{denoue, carter, andreasg, cooper} @fxpal.com

ABSTRACTDistributed teams must co-ordinate a variety of tasks. Todo so they need to be able to create, share, and annotatedocuments as well as discuss plans and goals. Many work-flow tools support document sharing, while other tools sup-port videoconferencing, however there exists little supportfor connecting the two. In this work we describe a systemthat allows users to share and markup content during webmeetings. This shared content can provide important con-versational props within the context of a meeting; it can alsohelp users review archived meetings. Users can also extractshared content from meetings directly into other workflowtools.

Categories and Subject DescriptorsH.4 [Information Systems Applications]: Miscellaneous

General TermsConferencing

KeywordsWeb meeting, video, annotation

1. INTRODUCTIONProject rooms are dedicated spaces that allow partici-

pants to revisit content in a series of meetings over weeks ormonths [1]. They involve the “creation, editing, and persis-tence of flexible documents” to support long-term collabora-tion around complex tasks. Physical project rooms typicallyinclude large shared spaces that a collocated design teamcan visually scan and utilize with ease. While aspects of thephysical design space are invaluable, they increasingly do notalign with the reality of modern knowledge work in whichpotential participants are widely distributed. Furthermoremany smaller organizations do not have enough real estateto allow them to devote an entire room to a single projectfor weeks on end.

Permission to make digital or hard copies of all or part of this work forpersonal or classroom use is granted without fee provided that copies arenot made or distributed for profit or commercial advantage and that copiesbear this notice and the full citation on the first page. To copy otherwise, torepublish, to post on servers or to redistribute to lists, requires prior specificpermission and/or a fee.Conference ’14 Fort Collins, CO, USACopyright 20XX ACM X-XXXXX-XX-X/XX/XX ...$15.00.

(a) (b)

(c) (d)

Figure 1: An interactive web meeting. (a) The cur-rent user (Sue) sees Tim’s and Manu’s frame archivealong the bottom as well as Tim’s currently activestream in the main view. (b) Tim opens Manu’sframe archive in a new window. (c) Tim annotatesa frame from Manu’s archive. (d) Tim shares hisannotation to the rest of the meeting participants.

In this work, we are interested in supporting purely digitalproject rooms – online spaces that allow a group of partic-ipants to create, view, annotate and edit documents in aflexible and persistent way. As described in [1], key com-ponents of such a room include shared, editable spaces, co-ordination documents (such as flow charts, todo lists, andother workflow tools), as well as co-working space. Onlinetools already exist that focus on editable digital spaces andcoordination documents. For example, Trello1 allows usersto create a variety of workflow documents and to populatethem with multimedia information. Other videoconferenc-ing tools allow users to share, and discuss documents ina meeting environment (e.g., WebEx2). However, projectrooms should support both synchronous and asynchronouscollaboration. Furthermore, they should allow documents to

1https://trello.com/2http://www.webex.com/

Figure 2: Annotations in our system are linked to underlying content. Here, Tim cycles through Manu’sframe archive in a pop-up window (Manu is currently sharing his webcam, which appears in the backgroundin the upper-right). Tim first views a slide he previously annotated (left) and then swipes to view the nextslide in the deck (center). Next he swipes back to the first slide (right).

flow seamlessly from synchronous use (e.g., showing slides ina meeting) to asynchronous use (e.g., perusing slides froma past meeting or pasting a photo of a slide on the wall soothers can refer to it later). Current tools fail to bridge thisgap.

We are building a digital project room environment thatallows users to collaborate synchronously via web-basedmeetings as well as asynchronously via a shared documentenvironment. With our system users can annotate and ex-tract content from live meetings both to improve in situcommunication as well as to allow others to explore impor-tant documents later. In this paper we focus in particularon the aspects of our system that allow users to mark-up,extract, and archive live content.

2. INTERACTION DESIGN

2.1 Meeting supportOur system is a pure HTML53 solution that utilizes Web-

RTC4 to connect clients. It relies on WebRTC media cap-ture and streaming5 to gain access to a user’s webcam andmicrophone. We use the WebSocket API6 for communi-cation between the clients and for signalling for the Web-RTC connections. Any number of clients may participatein a meeting, limited only by the available network band-width and the screen real-estate for displaying content fromremote participants. We also utilize the Desktop CaptureAPI7 to allow participants to stream arbitrary screen re-gions. Crucially, our system makes no other demands onthe end user system. This allows users to share arbitrarydocuments without forcing other users in a meeting to in-stall third-party applications. Several of those technologiesare still under development and are only supported by fewweb browsers. Still, we expect more mature and widespreadsupport in the near future.

A key feature of our system is that it automatically de-tects and indexes changes in shared content. When a userchanges their content stream, e.g., from their webcam toa screen region, or from one screen region to another, oursystem records the change event and captures a represen-

3http://www.w3.org/TR/html5/4http://www.w3.org/TR/webrtc/5http://www.w3.org/TR/mediacapture-streams/6http://www.w3.org/TR/websockets/7https://developer.chrome.com/extensions/desktopCapture

tative frame of the new content. Furthermore, our systemcan determine that the webcam has shifted focus from oneparticipant to another via live face detection. This is partic-ularly useful for connecting groups of users who have accessto pan-tilt camera systems (e.g., Jarvis8). Content framesare archived immediately to an interactive stack that any-one in the meeting can peruse. Furthermore, any user canre-share content from any other users’ stack in the samemeeting. Captured frames can also be extracted from themeeting and posted to a shared project page.

Figure 1 shows an example meeting with three partici-pants. Stacks for each participant show both the currentlyshared content (on top of the stack) as well as previouslycaptured frames from other content shared in the meeting.In this case, the two remote users (Tim and Manu) haveshared a variety of different content types. The local user(Sue) has only shared her webcam and therefore does nothave any archived frames. Note that the local user alsohas extra controls allowing her to switch her shared contentfrom an arbitrary screen or her webcam (similar to otherweb-conferencing tools she can also mute her audio stream).Sue is looking at Tim’s stream, which is currently an imageof one region of his screen. In Figure 1b,c, we see that Timhas opened Manu’s history, has found a slide Manu sharedpreviously, and has annotated it. The annotation snaps tothe underlying content which is detected via real-time imageanalysis. At this point, only Tim sees these actions. How-ever, he can choose to share this popup window (Figure 1d)so that everyone else can see his annotation while he bringsup a new point.

In Figure 2, Tim continues to look through Manu’s previ-ously shared content and then returns to the slide he anno-tated. Note that the annotation remains via content match-ing algorithms we will describe below.

2.2 Archiving meetingsFrames from each user are continuously captured at regu-

lar intervals. A keyframe is archived to the user’s historywhenever a screen change event is detected, while otherframes are discarded. Also, users can optionally record theircomplete streams (both audio and video) during a meeting.In our system, recorded content is collected on the clientbrowser and sent to the server opportunistically. The servermaintains a database of all recorded meetings and makesarchived content available via a web server.

8http://www.fxpal.com/research-projects/jarvis/

Figure 3: An example project page. Participantscan view archived meetings, and notes and mediacan be attached to archived meeting or directly tothe project. Participants can continue an archivedmeeting or start a new one at any time.

Users can manipulate a slider (Figure 1a, top) to perusearchived meeting content. As the user moves the slider back,each user’s stack is updated to show the archived frame cor-responding to that time. During a meeting the ability toperuse past content can be useful both for participants justjoining the conference to catch up [3] as well as for ongoingparticipants who might want to revisit previously discussedtopics. When no participants are actively streaming contentthe user can click a play button to watch the conference.This will play through the video in the main video view.If video streams from multiple participants were recorded,the playback switches among the streams based on speakeridentification. Alternatively, the viewer can select one of theparticipants to keep the focus there.

Note that users can join or rejoin a meeting at any time.Color-coded timelines just below the slider represent the pe-riods of the meeting during which each user was present. Ifa portion of a client’s stream was not recorded, either be-cause they were not in the meeting at that time or becausethey switched off recording, the user’s stack shows their lastarchived frame.

2.3 ProjectsOur goal is for meetings to be integrated into a higher level

project page that includes links to a set of related meetingsas well as other multimedia and workflow tools. For exam-ple, a working group could archive a series of developmentmeetings, design meetings, and HR meetings as well as a

shared to-do list on the same page.We are currently exploring integrating our web meet-

ing tool into open-source organization frameworks (e.g.,Redmine9). Simultaneously we are building a lightweightproject page that allows us to experiment with representingand organizing meetings as well as content extracted frommeetings. Figure 3 shows an example project page in whicha meeting is represented by the history stacks of each partic-ipant. Users can drag arbitrary media into the project pageeither from their own laptop or from a meeting.

The fact that our annotation approach operates directlyon shared screen content, rather than any specific documentformat, means that users can markup any content, includ-ing, for example, photos of sketches they have made. Here,one user has extracted an annotated slide from the meet-ing, while another has uploaded and annotated a photo of asketch.

Users can start a new meeting that will be associated withthis project by clicking the “+ MEETING” button. Theycan also add multimedia notes directly to the project page(“TODO...” in the example). Furthermore, users can returnto archived meetings at any time to peruse past content orcontinue the meeting.

3. ENHANCING ANNOTATION USINGREAL-TIME ANALYSIS

It is common for one user to annotate the screen contentshared by another conference participant. The keyframe his-tory in our interface provides additional opportunities forusers to review and optionally markup previously sharedcontent throughout ongoing meetings.

3.1 Creating annotationsRemote participants do not have a way to easily markup

this content. Previous work has focused on transient tracesor enhanced remote pointers; but these markups are notlinked to the underlying content and are usually freeform.We utilize real time image processing to detect and segmentunderlying content in the shared screen. When users electto annotate (e.g., highlight) underlying content, we integratethe results of the analysis to interpret their interaction.

Our approach for segmenting the underlying content com-putes a binary version of new frames as they are captured.Content streams such as screens and webcams are renderedinto an HTML5 Canvas element and the pixel data is ac-cessed to binarize and compute the boxes. Connected com-ponents [2] are then extracted and their bounding boxes aresorted and grouped to approximate word boundaries. Cur-rently we set a spacing threshold of six pixels to decide wordgroups and also filter out boxes that are less than ten pixelswide or high.

Users are shown this canvas element, refreshed in real-time as new video frames arrive. When a user starts a dragoperation over the canvas, the system collects the underly-ing bounding boxes and draws a yellow filled rectangle overthe canvas, giving users the impression that they are high-lighting the text underneath. Because the rectangle snapsto the connected component boxes, users can create verynice looking marks as if they were working with an originaldocument, i.e., their highlight follows neatly the content.

9http://www.redmine.org/

(a) (b)

Figure 4: Tracking annotations: word bounding boxes are shown in blue (a) The user has highlighted 2 words.(b) Then the presenter shifts the window to the left, the system finds these 2 word boxes in the new framesand repositions the highlight correctly.

3.2 Tracking annotationsWe are also developing algorithms that allow real-time

repositioning of marks created by users. During the naturalprogression of a meeting, a user may move or scroll a windowthat has been previously annotated as in Figure 2. Thesystem stores the word bounding boxes that are found underthe highlight, and now needs to find where this highlightshould be repositioned when new frames arrive.

Our current implementation finds a match between thenew bounding boxes and the boxes of the highlight basedon their dimensions. While this algorithm is very simple, wehave found it to be fairly robust to the natural translationsthat occur in meetings. There are a number of extensionsto this basic approach that could further improve accuracysuch as processing boxes’ content, or using spatial contextto guide the matching. Document image processing includesa number of methods for word shape analysis that are rel-evant [4]. When a match is found, the system repositionsthe highlight(s) at the estimated new location, as shown inFigure 4.

This approach works well when the highlights are longenough (e.g., span more than 3 word boxes). It allowedus to test the usefulness of a system that can repositionmarks in real-time: users don’t have to worry about movingwindows or scrolling documents during a web conference.Marks created by peers are automatically moved and reap-pear when the same content is shown again, even if at dif-ferent positions. In a typical scenario, Sue shares her screento ask feedback about a slide deck she is writing for theteam. Team members can highlight parts of the slide andwhen Sue moves her PowerPoint window on her screen orflips back and forth through slides, any previous marks areautomatically redrawn.

4. CONCLUSIONS AND FUTURE WORKIn this work, we introduced a system designed to help re-

mote users conduct and archive web-based meeting contentsuch that it can be organized and extracted to digital projectrooms. Our solution is novel in that it 1) automatically andin real time detects and bookmarks different content typesfrom each user’s stream (different faces, screens, etc.); and2) allows users to add annotations linked to content in realtime. In this way, users can share and annotate arbitrary

content types without forcing others to use any third-partysoftware. Furthermore, because content is archived in realtime, users can quickly recover any previously shared con-tent from the current meeting or another archived meeting.

We plan to continue building our web-based conferencingand meeting archive system with particular focus on the de-sign of interfaces for exploring archived meetings. We aimto exploit meetings’ persistence in a digital project roomto support the types of collaboration that physical projectrooms afford to the objects they contain. Archived meetingscan be reviewed and augmented by additional meeting par-ticipants subsequent to the original synchronous meeting,via interaction with the meeting’s archived content. We willspecifically explore the usefulness of the system’s markupand multimedia annotation tools to support this mode ofcollaboration.

5. REFERENCES[1] Covi, L. M., Olson, J. S., Rocco, E., Miller, W. J. and

Allie, P. A Room of Your Own: What Do We Learnabout Support of Teamwork from Assessing Teams inDedicated Project Rooms? Proc. of CooperativeBuildings: Integrating Information, Organization andArchitecture. 53–65. 1998.

[2] Chang, F., Chen, C-J., Lu, C-J. A linear-timecomponent-labeling algorithm using contour tracingtechnique. Computer Vision and Image Understanding.93(2):206-220. February 2004.

[3] Junuzovic, S.,Inkpen, K., Hegde, R., Zhang, Z., Tang,J. and Brooks, C. What Did I Miss?: In-meetingReview Using Multimodal Accelerated Instant Replay(Air) Conferencing Proc. of the SIGCHI Conference onHuman Factors in Computing Systems. 513–522. 2011.

[4] Lu, S., Li, L., Tan, C.L. Document Image Retrievalthrough Word Shape Coding. IEEE Transactions onPattern Analysis and Machine Intelligence.30(11):1913–18. November 2008.

building digital project rooms for web meetings...microphone. we use the websocket api6 for...

Documents