capturing conference presentations - ncast conference presentations.pdf · new approach by...

9
M any organizations have developed tech- nology to capture and stream presenta- tions. 1-3 Yet, presentation capture is impractical at many professional meetings and conferences because of high costs. For example, the typical expense of capturing and publishing presenta- tions using conventional technology is $5,000 to $20,000 per day, depending on the capture com- plexity and how you produce the final product. We’ve developed an approach that’s similar to the live-to-videotape recording process the broadcast industry uses, except we record com- pressed material onto a computer disk. Captured media files can be published immediately with- out offline editing or postproduction, signifi- cantly reducing publication cost. We tested our new approach by capturing presentations at the Association for Computing Machinery (ACM) Workshop on Network and Operating System Support for Digital Audio and Video (NOSSDAV) 2005. 4 The total cost of the equipment we used in our experiment (including audio, video, and comput- er equipment) was approximately $12,000. The production team included one production assis- tant and one person who acted as both webcast producer and director. Based on this experiment, we estimate it’s possible to capture and publish conferences for approximately $3,000 per day plus expenses (such as travel, room, and board). This estimate includes equipment rental. This article describes the technology and process we used to capture and publish the NOSSDAV presentations. A longer version of this article and a slide show showing pictures of the equipment and the room are available at http:// bmrc.berkeley.edu/research/nossdav05. Capture process The basic idea behind presentation capture is to capture audio, video, and graphics (that is, RGB output from a presentation computer) and encode it into a compressed digital media file that users can replay on demand. The challenge is to capture high-quality images of the pro- jected presentation material inexpensively. Most conference presentations use relatively static slides with transition effects and builds. Occasionally, a presentation will include ani- mations to illustrate dynamic behavior. Some presenters use continuous media (such as audio and video) and live demonstrations in their pre- sentations. Dynamic behaviors, continuous media, and live demonstrations are especially difficult to capture. The conventional approach to lecture capture is to use one or two cameras focused on the speaker—for example, for a close-up of the speaker and a wide-angle shot of the stage—and a wireless microphone to capture the speaker’s audio presentation. Some productions use a third camera to capture audience members as they ask questions. Problems arise when you try to capture the graphics material projected to the audience. Typ- ically, the computer’s RGB signal is converted to a video signal by using a scan converter or point- ing a camera at the projection screen. Both approaches have limitations because the RGB sig- 76 1070-986X/06/$20.00 © 2006 IEEE Published by the IEEE Computer Society Multimedia at Work Qibin Sun Institute for Infocomm Research Lawrence A. Rowe University of California, Berkeley Vince Casalaina Image Integration Capturing Conference Presentations Working on multimedia and e-learning areas, you might have heard about the Berkeley MPEG-1 Tools, the Berkeley Multimedia, Interfaces, and Graphics (MIG) Seminar/Lecture Webcasting System, or the Open Mash Streaming Media Toolkit. All these achievements were produced from the group led by Larry Rowe. In this issue, we invite Rowe to introduce his latest low-cost system on automated presentation capture, which covers both the technology and the process. In particular, he shares with us some valuable thoughts on improving the quality of the captured material, the process from capture to postproduction, the system usability (such as the user-friendly interface), and the media streaming protocols to support playback. Qibin Sun Editor’s Note

Upload: others

Post on 10-Mar-2020

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Capturing Conference Presentations - NCast Conference Presentations.pdf · new approach by capturing presentations at the Association for Computing Machinery (ACM) Workshop on Network

Many organizations have developed tech-nology to capture and stream presenta-

tions.1-3 Yet, presentation capture is impracticalat many professional meetings and conferencesbecause of high costs. For example, the typicalexpense of capturing and publishing presenta-tions using conventional technology is $5,000 to$20,000 per day, depending on the capture com-plexity and how you produce the final product.

We’ve developed an approach that’s similarto the live-to-videotape recording process thebroadcast industry uses, except we record com-pressed material onto a computer disk. Capturedmedia files can be published immediately with-out offline editing or postproduction, signifi-cantly reducing publication cost. We tested ournew approach by capturing presentations at theAssociation for Computing Machinery (ACM)Workshop on Network and Operating SystemSupport for Digital Audio and Video (NOSSDAV)2005.4

The total cost of the equipment we used in ourexperiment (including audio, video, and comput-er equipment) was approximately $12,000. Theproduction team included one production assis-tant and one person who acted as both webcast

producer and director. Based on this experiment,we estimate it’s possible to capture and publishconferences for approximately $3,000 per dayplus expenses (such as travel, room, and board).This estimate includes equipment rental.

This article describes the technology andprocess we used to capture and publish theNOSSDAV presentations. A longer version of thisarticle and a slide show showing pictures of theequipment and the room are available at http://bmrc.berkeley.edu/research/nossdav05.

Capture processThe basic idea behind presentation capture

is to capture audio, video, and graphics (that is,RGB output from a presentation computer) andencode it into a compressed digital media filethat users can replay on demand. The challengeis to capture high-quality images of the pro-jected presentation material inexpensively.Most conference presentations use relativelystatic slides with transition effects and builds.Occasionally, a presentation will include ani-mations to illustrate dynamic behavior. Somepresenters use continuous media (such as audioand video) and live demonstrations in their pre-sentations. Dynamic behaviors, continuousmedia, and live demonstrations are especiallydifficult to capture.

The conventional approach to lecture captureis to use one or two cameras focused on thespeaker—for example, for a close-up of thespeaker and a wide-angle shot of the stage—anda wireless microphone to capture the speaker’saudio presentation. Some productions use athird camera to capture audience members asthey ask questions.

Problems arise when you try to capture thegraphics material projected to the audience. Typ-ically, the computer’s RGB signal is converted toa video signal by using a scan converter or point-ing a camera at the projection screen. Bothapproaches have limitations because the RGB sig-

76 1070-986X/06/$20.00 © 2006 IEEE Published by the IEEE Computer Society

Multimedia at Work Qibin SunInstitute for Infocomm Research

Lawrence A. RoweUniversity of

California,Berkeley

Vince CasalainaImage Integration

Capturing Conference Presentations

Working on multimedia and e-learning areas, you might have heardabout the Berkeley MPEG-1 Tools, the Berkeley Multimedia, Interfaces, andGraphics (MIG) Seminar/Lecture Webcasting System, or the Open MashStreaming Media Toolkit. All these achievements were produced from thegroup led by Larry Rowe.

In this issue, we invite Rowe to introduce his latest low-cost system onautomated presentation capture, which covers both the technology andthe process. In particular, he shares with us some valuable thoughts onimproving the quality of the captured material, the process from captureto postproduction, the system usability (such as the user-friendly interface),and the media streaming protocols to support playback.

—Qibin Sun

Editor’s Note

Page 2: Capturing Conference Presentations - NCast Conference Presentations.pdf · new approach by capturing presentations at the Association for Computing Machinery (ACM) Workshop on Network

nal has too much data compared to a video sig-nal. Digitizing and compressing these imagesdiscards 50 to 70 percent of the image’s infor-mation, which often results in unreadable pre-sentation material.

Another approach is to acquire the presenta-tion source files and create images in postpro-duction that can be synchronized with thespeaker’s audio and video.5 This approach pro-duces high-quality slide images but raises the costof production unless speakers are constrained toa limited set of presentation packages. Capturingdynamic material is still problematic. Moreover,some speakers won’t provide copies of their files.A prior experiment at ACM Multimedia 2001used this approach, and the published materialcontained only 30 percent of the slides.6

In our approach, we directly capture the RGBsignal using an NCast Telepresenter G2. (Weshould note that this article’s first author, Rowe, isa cofounder and investor in the company.) Withthis method, the image quality is substantiallybetter, and we capture the dynamic material. TheG2 (see http://www.ncast.com/telepresenterG2.html) contains an embedded computer that runssoftware to digitize audio and RGB signals andthen compresses them using MPEG-4 codecs. Itcan webcast live streams and archive the materi-al in an MP4 file for on-demand replay. The G2produces material compatible with Internet Engi-neering Task Force (IETF) and InternationalTelecommunications Union (ITU) standards thatusers can play using the Apple QuickTime Play-er. The G2 can be controlled by using the embed-ded Web interface or by a program that accessesthe G2 through a Transmission Control Proto-col (TCP) or serial connection. The G2’s retailcost is $5,500.

The G2 captures RGB images, so we need toconvert the National TV Standards Committee(NTSC) video signal produced by cameras record-ing the speaker into an RGB signal. We used aKramer VP-720DS seamless switcher (see http://www.kramerelectronics.com), which accepts upto four video inputs and one RGB input andproduces an RGB output selected from one ofthe inputs.

The VP-720DS has been discontinued, butKramer makes many similar products that can beused for this application. The switcher scales theselected input to the specified output format anduses frame-accurate switching. It also provides apicture-in-picture (PIP) function that will showthe RGB signal composed with one of the video

signals or one of the video signals composed withthe RGB signal. The retail cost of the VP-720DSis $1,595, although they’re widely available for$1,200.

We used two cameras: a manually controlledcamera at the back of the room and a pan, tilt,and zoom (PTZ) camera located in the aislebetween the classroom seating tables. Figure 1shows a schematic of the room and a picturefrom the podium. We used the manual camera toprovide a wide-angle view of the stage and toshow people asking questions. The PTZ camerawas used for close-ups of the speakers and panelmembers. The captured presentation is a singlevideo stream that shows the speaker, the presen-tation material, or the presentation material with

77

Octo

ber–D

ecemb

er 2006

Podium Panel

Entrance

Front of room

Projection screen

Lecture captureproduction

(a)

(b)

Figure 1. Pictures of the

room in which the

NOSSDAV conference

was held. (a) Room

configuration, and (b)

view of the room from

the speaker’s podium.

Tom Mitchell
Highlight
Tom Mitchell
Highlight
Page 3: Capturing Conference Presentations - NCast Conference Presentations.pdf · new approach by capturing presentations at the Association for Computing Machinery (ACM) Workshop on Network

the speaker in a PIP window. Figure 2 showsexamples of each.

Figure 3 shows the equipment configurationwe used during the capture. The director(Casalaina) operated the wide-angle camera, an

audio mixer to control sound levels, and two GUIapplications that ran on a laptop to control thePTZ camera (a Canon VCC4) and the capture andswitching hardware.

The house audio system provided a singleaudio signal that combined output from the wire-less microphone, a wired podium microphone,and audio from the speaker’s presentation com-puter. The podium microphone captured audi-ence questions and speaker introductions.

We designed the control software to be easyto use and to provide only the functions requiredfor lecture capture. Our hope was to automate asmuch as possible of the production process.

One application controlled the PTZ camera anda second application controlled the capture andswitching hardware. The camera control applica-tion provided an interface to pan, tilt, or zoom thecamera smoothly at a user-configured speed andto set or recall up to six preset positions.

The capture/switching application providedfunctions to control capture (for example, tostart, pause, resume, or stop), select the videosource (such as a wide-angle camera, close-upcamera, or RGB signal), control use of PIP, andconfigure selected hardware properties such asthe capture format (video graphics array [VGA],super video graphics array [SVGA], and Extend-ed Graphics Array [XGA]).

We wrote the control applications in Tcl/Tk,which together include aproximately 3,500 linesof code. The code sends commands to the VCC4camera and VP-720DS using serial connectionsand to the G2 using a TCP connection. (Moredetails on these applications, including screendumps that show the interfaces, are available athttp://bmrc.berkeley.edu/research/nossdav05/capture/.)

The conference was scheduled for Mondayand Tuesday, so we arrived Sunday to prepare. Ittook approximately three hours to setup and testthe equipment on site.

During the event, the director operated theequipment and monitored the capture. We ranthe audio signal through a small mixer so wecould easily control sound levels. Video wasmonitored on an RGB display from the G2 thatshowed the captured video. The G2 display canbe configured to show a sound meter for cap-tured audio. Hence, we were able to verify thatthe sound was intelligible and the captured sig-nal was acceptable. The production assistant(Rowe) solicited performance releases, helpedspeakers with RGB output settings, and tweaked

78

IEEE

Mul

tiM

edia

Multimedia at Work

Figure 2. Three

examples of the

material captured for a

presentation. (a) A

close-up of the speaker,

(b) the presentation

material, and (c) a

composition showing

the speaker and the

slides using picture-in-

picture.

(a)

(b)

(c)

Page 4: Capturing Conference Presentations - NCast Conference Presentations.pdf · new approach by capturing presentations at the Association for Computing Machinery (ACM) Workshop on Network

the control software. (ACM approved the perfor-mance release, which is available at http://bmrc.berkeley.edu/research/nossdav05/capture/acm-vid-release.pdf, before the event.)

Previous experience suggested that 67 percentof the presenters would sign the release. Oftenpresenters decline because they are uncertainwhether they had releases for material used inthe talk or because they worked for organizationsthat required that corporate lawyers sign thereleases. All the NOSSDAV presenters signed therelease, probably because most speakers werefrom universities.

The G2 stores the captured media files on aninternal disk. We captured the conference orga-nizers’ welcome and introduction, presentationsfor the 33 accepted papers, the keynote address,and nine question and answer sessions, which

produced 44 media files that occupied 8.7 Gbytesof disk space. Sadly, the video for one talk wasn’trecorded correctly for unknown reasons. It mighthave been an operator error starting or stoppinga capture or a software bug. We still publishedthe audio for the talk.

It took approximately an hour to tear downthe equipment and repack it for transportationafter the conference ended. We also made a copyof the media files on a separate disk just in casethere was a problem on the return trip.

PostproductionAs we mentioned earlier, the NCast G2 pro-

duces files that users can play on a QuickTimePlayer. We installed a Darwin Streaming Server(DSS) on a FreeBSD PC located at the Universityof California, Berkeley, and loaded the captured

79

Octo

ber–D

ecemb

er 2006

Presenter PC

Control PCPreview monitor

Program monitor

NCast G2-R

RGB in

Audio left in

Audio right in

Kramer Video Scalar

RGB in Y/C1 in Y/C2 in

RGB out

RGB out

Serial in

Legend:VideoAudioSerialEthernetRGB

Wireless microphone

B inY/C

BNC

PTZ camera

A out

A in

Y/C

RGB D/AProjector

RGB

Manual camera

Audiomixer

Audioreceiver

Serial

Serial

Audio

Monitorheadphones

EtherHub

Figure 3. The audio, video, and computer equipment used to capture the presentations. The presenter’s PC and wireless microphone

and the presentation projector are in the upper left corner. We brought all the other equipment to capture the event. The schematic

depicts the various interconnections and signals (such as video and audio). The producer uses the audio mixer and monitor

headphones, the preview and program monitors, and the control PC to capture the presentation. The preview monitor lets the

producer set up the camera not currently selected for program output.

Page 5: Capturing Conference Presentations - NCast Conference Presentations.pdf · new approach by capturing presentations at the Association for Computing Machinery (ACM) Workshop on Network

files onto it. We then played the material usingvarious Windows and Macintosh PCs from dif-ferent places including high-speed connectionsat Berkeley and other universities and broad-band connections at home. The captured materi-al didn’t play well for two reasons:

❚ The material was captured at 1.5 megabits persecond (Mbps) , which is too demanding formany broadband home connections.

❚ The material was captured at the native imagesize of the presentation, typically XGA, at 30frames per second (fps). A relatively new PCwas required to decode this material.

Consequently, we decided to recode the mate-rial so that more people could play it. We hadtrouble finding an inexpensive software packageto transcode the files. We found several packagesthat appeared to work, but they cost between$400 and $1,000. While we were searching forthe best alternative, Apple released QuickTimeV7 Pro, which includes the required transcodingfunctionality, runs on Macintosh and WindowsPCs, and costs $30.

After experimenting with different formats,we decided to publish two versions of each pre-sentation, specifically a low-quality version thatusers can play anywhere and a high-quality ver-sion for people with fast network connectionsand computers. The low-quality version uses 384× 256 images at 15 fps that require 600 kilobitsper second (Kbps), and the high-quality versionuses 512 × 384 images at 15 fps that require 1,200Kbps. We used the recently released QuickTimeH.264 video codec for the published materialbecause the transcoding software supported it

and it appeared to produce better results than theMPEG-4 video codec.

Transcoding all the material was time con-suming because it required three and nine timesreal time to produce the 600 and 1,200 Kbpsmaterial, respectively. The H.264 codec in Quick-Time V7 Pro has one- and two-pass encoders. Weused the one-pass encoder even though theresults were better with the two-pass encoderbecause the two-pass encoder required 40 timesreal time to transcode a file. We had 14 hours ofmaterial to transcode at two settings. Using theone-pass encoder, it still took approximately 170hours to transcode the material.

We produced Web pages to play the material,including a listing of all talks and popup win-dows to play each talk. It took some effort, buteventually we were able to get the HTML to workcorrectly on all Web browsers using the embed-ded QuickTime Player.

PublicationThe conference was held 13–14 June 2005,

and we published the material on 1 September2005. (The presentations are available at http://bmrc.berkeley.edu/research/nossdav05/.) TheACM SIG Multimedia and NOSSDAV Web sitesand mailing lists advertised the material’s avail-ability. Users were able to play the material suc-cessfully 329 times, which is 60 percent of theattempted plays (548), in the 11 months betweenSeptember 2005 and July 2006. We’ve omittedfrom these statistics plays by the site producerduring development and testing.

Of the 219 failed attempts, 179 (80 percent)were logged as server timeout errors, which arecaused by users trying to play the material on acomputer behind a firewall or network addresstranslation (NAT) router using a datagram pro-tocol (such as Real-Time Streaming Protocol[RTSP]) rather than a TCP-based protocol (suchas HTTP). Most of these errors occurred in thefirst two months.

The player or server software didn’t work dur-ing November and December 2005, which wediscuss in more detail in the next section. Wechanged the way the videos were played in earlyJanuary so that all playback used HTTP transport.Since that time, we’ve noticed a significantdecrease in server timeout errors. The other 40errors are bad requests (for example, the URLdoesn’t exist or a wrong format is requested).

Looking at the successful plays, 35 percentused the high-speed version and 64 percent used

80

IEEE

Mul

tiM

edia

Multimedia at Work

It took some effort, but

eventually we were able

to get the HTML to work

correctly on all Web browsers

using the embedded

QuickTime Player.

Page 6: Capturing Conference Presentations - NCast Conference Presentations.pdf · new approach by capturing presentations at the Association for Computing Machinery (ACM) Workshop on Network

the low-speed version. The remaining 1 percentplayed the audio-only talk. We’re surprised thatmore people didn’t play the high-speed versionbecause we expected that most people interestedin the material would be at universities, whichtypically have high-speed connections that canaccess the Berkeley server.

Each talk and Q&A session was playedbetween 0 and 40 times with a 7.7 mean numberof plays (standard deviation 9.1). Surprisingly,three talks have never been played.

The most popular talks are

❚ Keynote address “Multimedia SystemsResearch: A Retrospective” by Harrick Vinfrom the University of Texas (40 plays),

❚ “Supporting P2P Gaming When Players HaveHeterogeneous Resources” by Brian NeilLevine from the University of Massachusetts(36 plays), and

❚ “Mirinae: A Peer-to-Peer Overlay Network forLarge-Scale Content-Based Publish/SubscribeSystems” by Yongjin Choi from KAIST (31plays).

The most popular panel discussion was on“Network Gaming,” which has been played fivetimes and included researchers Brian NealLevine from the University of Massachusetts,Chris Chambers from Portland State University,Grenville Armitage from Swineburne Universi-ty, and Kuan-Ta Chen from National TaiwanUniversity.

We’re disappointed the material has beenplayed only one to two times per day. Althoughwe expected replays to decline over time, wethought people interested in the topics who did-n’t attend the workshop would play the materi-al. The problem might be publicity, since it’sdifficult to advertise the material’s availability,and the content’s one-time nature.

Lessons learnedSeveral things worked well, including the

switching and capture hardware and the low-costmodel for capture and publication. Although webelieve it’s possible to capture and publish a sin-gle-track conference for approximately $3,000 aday plus expenses, this price will of course behigher if you use additional equipment. Still, it’sreasonable to expect the cost to remain wellunder $5,000 per day. Also, we were able to cap-

ture all the workshop presentations, regardless ofthe slide and computer technology the speakerused, including all dynamic material. We believethe resulting published material is of reasonablequality given the playback constraints (networkbandwidth and computer processing power) anda few production glitches we’ll discuss shortly.Nevertheless, as in any production, there’s roomfor improvement.

Improving qualityGenerally speaking, the material we captured

is good quality, but it can be improved. First, wecaptured the material at 30 fps using the nativeresolution of the presenter’s projected material ifthe resolution was XGA or smaller and XGA res-olution if larger. Although it reduces visual qual-ity, a lower resolution capture (such as SVGA) at15 fps is good enough given the constraints ofcurrent playback technology.

Scaling higher-resolution images to SVGA andapplying typical video coding algorithms pro-duced some “ringing” around text on the slides—that is, ghost edges around the characters.Modern computers are exceptionally good at dis-playing material at different resolutions. Wherepossible, we need to encourage presenters to uselower resolution when projecting their material.This problem is related to bandwidth availablefor transmitting the material during playbackand decoding efficiency of the playback comput-er. Over time, these constraints will be relaxed,and it will be practical to capture larger images athigher frame rates.

We could also improve audio capture. Somespeakers didn’t use the wireless microphone. Thecaptured audio was good if they stayed at thepodium, but sometimes they strayed away fromthe podium or looked at the screen, whichimpacted quality. The obvious advice is to forcespeakers to use the wireless microphone.

Audio capture of audience questions must beimproved because they were sometimes difficultto hear. We thought the podium microphonewould pick up most audience questions, which itdid. However, sometimes the audience memberdidn’t speak loudly, and it was difficult for thedirector to change the sound level quickly enoughduring interaction between the speaker and audi-ence. We should have used several microphonespointed at the audience and controlled them sep-arately at the mixer to capture questions.

Finally, we had only one wireless microphone.We needed several microphones so the session

81

Octo

ber–D

ecemb

er 2006

Page 7: Capturing Conference Presentations - NCast Conference Presentations.pdf · new approach by capturing presentations at the Association for Computing Machinery (ACM) Workshop on Network

82

IEEE

Mul

tiM

edia

Multimedia at Work

moderator could always be wired and the nextspeaker could get ready before it was time to talk.

We also had a minor problem positioning theRGB image on the projection screen and at cap-ture. The projector in the room had a remotecontrol to move the image left/right or up/down,but we didn’t notice the problem during testing.As a result, the RGB images in the first few talkswere shifted up and to the left when captured,which led to video noise across the bottom of thecaptured images. Both the VP-720DS and the G2have controls to move the image, but we didn’thave access to them in our control software. Thisproblem can be easily fixed.

Another way to improve the captured materi-al’s quality is to use more cameras and providethe director with more control. We didn’t incor-porate an audience camera positioned at thefront of the room because we didn’t have anextra camera. We will do so in the future as longas audience members don’t object. And we willuse PTZ cameras for all sources rather than amanual camera because it will simplify operationfor the director.

Wide-angle views of the stage were unusablewhen slides were being projected because thebright light bouncing off the screen caused thecamera auto exposure to close the aperture,which produced a dark image that made it diffi-cult to see the speaker. A good spotlight on thespeaker will fix this problem.

NCast has released the Telepresenter M3 withadditional production features including a PIPfunction and a graphic overlay function that canbe used for titling. Reducing the hardware simpli-fies setup and operation and improves reliability.

Lastly, Automatic Sync Technologies (seehttp://www.automaticsync.com) is a commercialcompany that offers an automated captioningservice for streaming media. They produce a mul-timedia title using any of the popular streamingmedia formats that scrolls text of an audio tran-script synchronized with the audiovideo materi-al. They can also produce a word-based searchindex to the material. The service costs approxi-mately $185 per hour of source material, whichmeans the NOSSDAV material could be processedfor less than $3,000. We think future publicationof presentations and discussions at conferencesshould include this capability with the materialthey publish.

Improving processWe could also make several changes to

improve the capture and postproduction process.First, a preconfigured custom hard-shell case forthe production equipment would greatly simpli-fy preparation before an event and setup at theremote location. The case can incorporate a smallrack for the equipment with sound dampeningand access to the front and back panels. We canalso add small rack-mounted LCD displays formonitoring various video sources in place of theheavy, awkward-sized professional video moni-tor we used in this experiment. These cases arerelatively inexpensive and many companies willcustom design them for a specific application.

Second, we can substantially improve thepostproduction and publication process. Becauseit was the first time we used this approach in aconference setting, it took almost two months topublish the captured material. This delay wascaused in part because we had to determine thebest playback representation and transcode thematerial. In future productions, we will changethe capture parameters to avoid transcoding. Wealso had to setup the media server and authorWeb pages for the conference program and indi-vidual presentations. Most of this work onlyneeds to be done once or can be automated.

During the event, we spent considerable timekeeping track of the speaker and the recorded filethat corresponded to each talk. The G2 identifiesthe talk by encoding the beginning date and timeof the capture into the file name. We copied thematerial off the G2 by hand and then used scriptsto produce the Web pages given the files andinformation about the talks (for example, title,authors, affiliation, speaker, talk duration, andstart time). We can easily automate this step byentering the conference program ahead of timeand relating it to the capture files. Moreover, wecould open up the G2 interface to the embeddedFTP server and automate the entire postproduc-tion process.

Finally, several research groups have exploredautomating the decisions made by a webcastdirector during a live event.2,7,8 Clearly, this tech-nology should be incorporated into conferencepresentation capture.

Improving usabilityNumerous changes can be made to the control

software used to capture the event. First, we needto fix the PIP interface. The control softwareneeds a simple configuration interface that letsthe director change the PIP location (to bottomleft or right) to more easily adapt to the spatial

Page 8: Capturing Conference Presentations - NCast Conference Presentations.pdf · new approach by capturing presentations at the Association for Computing Machinery (ACM) Workshop on Network

positioning at the conference venue. If the PIPwindow is on the lower right side of the imageand the speaker is standing to the left of thescreen as you face the stage, the speaker’s gestureto the screen on his left is off the right edge of thecaptured image. By moving the PIP window tothe lower left, the speaker gesture points to theslide in the captured image. Figure 4 illustratesthis problem. If the speaker is on the screen’s rightside, you need to change the PIP position. Hence,the control software must make it easy for thedirector to change the PIP location dynamically.

This feature is easy to add because the Kramerinterface has the function. But one problem withthe Kramer is the absence of a function to switchthe PIP and main window source. This functionexists through the VP-720DS onscreen displayinterface, but we couldn’t execute that functionremotely, even when we tried to mimic theonscreen operations. The device clearly has thefunction, but it’s unavailable through the serialcontrol interface.

This limitation caused problems because sev-eral times the director wanted to swap the PIPand main window images. To do it, he had toturn off the PIP window, switch to the alternativesource, and turn on the PIP window, which wasdistracting and time consuming. Moreover, itprobably increased the compressed bits becausethe codec produces encodings for the intermedi-ate images.

To improve camera control, we need torewrite the PTZ camera control software. Thesoftware we used was developed originally for aCanon VCC3. We used the VCC3 emulationmode on the VCC4 camera. The VCC4 also hasmore functions (such as variable speed moves)that we can exploit to improve the capturedimages. The VCC3 has manual iris and focus con-trols, but we couldn’t get them to work in theemulation mode on the VCC4. Presumably, theVCC4 interface to these controls works.

Finally, we need to define some presets tomove the PTZ camera in one or two dimensionsand to add more presets. A preset in the currentsoftware defines an absolute setting for pan, tilt,and zoom. Several times the director wanted topan to the right or left at the same tilt and zoomsettings. In effect, he wanted a delta from the cur-rent position, rather than an absolute setting fora preset. We also need to add groups of presets sothe director can easily switch between them. Forexample, individual speakers and panel sessionsrequire different settings.

Improving playbackWe used the QuickTime Player embedded in

a Web page to play the recorded material. Sever-al users had problems playing the material. Gen-erally speaking, it worked well on Macs runningOS X using the Safari Web browser. Althoughusers were able to play the material using Win-dows computers and other browsers (such as Fire-fox and Internet Explorer), most had problemswith streaming transport because the user had tomanually configure it. Users have no patience forconfiguring software to view material like thesepresentations. Playback must work like TV: go toa Web page and it works.

The QuickTime embedded player can trans-port content using either RTSP or HTTP stream-ing. Given the state of the Internet today, nearlyeveryone uses HTTP streaming because of fire-walls and NAT routers. However, the player usesRTSP streaming by default so the user must resetthe transport parameter manually. Most users,

83

Octo

ber–D

ecemb

er 2006

Figure 4. Spatial

relation between

speaker and PIP

window. It must be

easy for the director to

change the PIP location

dynamically to avoid

having (a) the speaker

gesture off screen. (b)

Preferably, the speaker

will gesture onto the

screen.

(a)

(b)

Page 9: Capturing Conference Presentations - NCast Conference Presentations.pdf · new approach by capturing presentations at the Association for Computing Machinery (ACM) Workshop on Network

including experienced computer scientists, wereconfused by this requirement even though ourWeb pages described the problem and explainedhow to change the setting.

Moreover, a recent release of the QuickTimesoftware for Windows (version 7.0.3) exacerbat-ed this problem. Prior to this release, the usercould set the transport to use port 8000 withHTTP streaming. This release doesn’t let userschange the port—they must use default port 80.This restriction, or more likely bug, caused prob-lems because we run the DSS server on the samemachine as a Web server. We didn’t notice thisproblem with the material for more than twomonths because no one notified us that thematerial was unplayable. The server logs showthat people just stopped playing the material.

We fixed this problem by explicitly includingthe port number (8000) in the RTSP URL we usedto launch playback. This port uses HTTP stream-ing by default, so it removed the requirement thatpeople explicitly set the transport parameter.

ConclusionThis experiment demonstrates that it’s possi-

ble to capture conference and workshop presen-tations for on-demand replay for $3,000 per day.We believe professional organizations such as theACM and IEEE should consider capturing pre-sentations for many, if not all, conferences. Overtime, this cost should decline and the quality ofthe captured material will improve. MM

AcknowledgmentsWe thank the ACM Special Interest Group

Multimedia Chair Ramesh Jain for funding thisexperiment. We also thank Bobb Bottomley, whois responsible for the audiovideo technology atSkamania Lodge where the conference was held.Lastly, we thank all the speakers who agreed tobe captured for posterity.

References1. L. Rowe et al., BIBS: A Lecture Webcasting System,

Berkeley Multimedia Research Center Technical

Report, Univ. of Calif., Berkeley, 2001; http://bmrc.

berkeley.edu/bibs-report.

2. Y. Rui et al., “Automating Lecture Capture and

Broadcast: Technology and Videography,” Multime-

dia Systems J., vol. 10, no. 1, 2004, pp. 3-15.

3. A. Steinmetz and M. Kienzle, “The E-Seminar Lec-

ture Recording and Distribution System,” Multime-

dia Computing and Networking 2001, Proc. Int’l Soc.

for Optical Engineering, vol. 4312, SPIE, 2001, pp.

25-36.

4. W.-C. Feng and K. Mayer-Patel, eds., Proc. 15th Int’l

Workshop on Network and Operating Systems Support

for Digital Audio and Video, ACM Press, 2005.

5. S. Mukhopadhyay and B. Smith, “Passive Capture

and Structuring of Lectures,” Proc. 7th ACM Int’l

Conf. Multimedia, ACM Press, 1999, pp. 477-487.

6. SOMA Media, ACM Multimedia 2001: Conference

Presentations DVD, ACM Press, 2002.

7. M. Bianchi, “Automatic Video Production of Lectures

Using an Intelligent and Aware Environment,” Proc.

3rd Int’l Conf. Mobile and Ubiquitous Multimedia

(MUM 04), vol. 83, ACM Press, pp. 117-123.

8. E. Machnicki and L.A. Rowe, “Virtual Director:

Automating a Webcast,” Multimedia Computing and

Networking 2002, Proc. Int’l Soc. for Optical

Engineering, vol. 4673, SPIE, 2002, pp. 208-225.

Readers may contact the authors at [email protected].

edu and [email protected].

Contact Multimedia at Work editor Qibin Sun at qibin@

i2r.a-star.edu.sg.

84

Multimedia at Work

IEEE Distributed Systems Online brings youpeer-reviewed articles, detailed tutorials, expert-managedtopic areas, and diverse departments covering the latestnews and developments in this fast-growing field.

Log on for free access to such topic areas as

Grid Computing • Mobile & Pervasive

Cluster Computing • Security • Peer-to-Peer

and More!To receive monthly

updates, [email protected]

http://dsonline.computer.org