table of contents - heim.ifi.uio.noheim.ifi.uio.no/griff/sigmm-records-1404.pdf · the second day...

32
. Volume 6, Number 4 December 2014 Published by the Association for Computing Machinery Special Interest Group on Multimedia ISSN 1947-4598 http://sigmm.org/records Table of Contents 1 Volume 6, Issue 4, December 2014 (ISSN 1947-4598) 1 Report from SLAM 2014 1 Launching the first-ever National Data Science Bowl 2 MPEG Column: Press release for the 109th MPEG meeting 4 openSMILE:) The Munich Open-Source Large-scale Multimedia Feature Extractor 12 Papers 13 Call for Workshop Proposals @ ACM Multimedia 2015 13 MPEG Column: 110th MPEG Meeting 15 Summary of the 5th BAMMF 17 NSF/Intel Partnership on Visual and Experiential Computing (VEC) 17 PhD Thesis Summaries 17 Jason J Quinlan 18 Lilian Calvet 19 Recently published 19 ACM TOMM, Volume 11, Issue 2 19 MMTC R-Letter Volume 5, Issue 6 19 MTAP Volume 73, Issue 3 21 MTAP Volumne 74 Issue 1 21 Job Opportunities 21 PhD Position in Layered Video Distribution over ICN 22 PhD Studentships (Video Streaming/SDN) in Cork Ireland 23 Postdoc on Machine Learning and Applications at the Australian National University 23 Calls for Contribution 23 CFPs: Sponsored by ACM SIGMM 24 CFPs: Sponsored by ACM (any SIG) 25 CFPs: Sponsored by IEEE (any TC) 26 CFPs: Not ACM-/IEEE-sponsored 29 Back Matter 29 Notice to Contributing Authors to SIG Newsletters 29 Impressum

Upload: duonghuong

Post on 16-Apr-2018

212 views

Category:

Documents


0 download

TRANSCRIPT

.

Volume 6, Number 4December 2014

Published by the Association for Computing MachinerySpecial Interest Group on Multimedia

ISSN 1947-4598http://sigmm.org/records

Table of Contents

1 Volume 6, Issue 4, December 2014 (ISSN 1947-4598)

1 Report from SLAM 2014

1 Launching the first-ever National Data Science Bowl

2 MPEG Column: Press release for the 109th MPEG meeting

4 openSMILE:) The Munich Open-Source Large-scale Multimedia Feature

Extractor

12 Papers

13 Call for Workshop Proposals @ ACM Multimedia 2015

13 MPEG Column: 110th MPEG Meeting

15 Summary of the 5th BAMMF

17 NSF/Intel Partnership on Visual and Experiential Computing (VEC)

17 PhD Thesis Summaries

17 Jason J Quinlan

18 Lilian Calvet

19 Recently published

19 ACM TOMM, Volume 11, Issue 2

19 MMTC R-Letter Volume 5, Issue 6

19 MTAP Volume 73, Issue 3

21 MTAP Volumne 74 Issue 1

21 Job Opportunities

21 PhD Position in Layered Video Distribution over ICN

22 PhD Studentships (Video Streaming/SDN) in Cork Ireland

23 Postdoc on Machine Learning and Applications at the Australian National University

23 Calls for Contribution

23 CFPs: Sponsored by ACM SIGMM

24 CFPs: Sponsored by ACM (any SIG)

25 CFPs: Sponsored by IEEE (any TC)

26 CFPs: Not ACM-/IEEE-sponsored

29 Back Matter

29 Notice to Contributing Authors to SIG Newsletters

29 Impressum

.

.

ACM SIGMM RecordsVol. 6, No. 4, December 2014 1

ISSN 1947-4598http://sigmm.org/records

Volume 6, Issue 4, December 2014 (ISSN 1947-4598)

SIGMM RecordsVolume 6, Issue 4, December 2014 (ISSN 1947-4598)

Report from SLAM 2014

ISCA/IEEE Workshop on Speech, Language andAudio in Multimedia

Following SLAM 2013 in Marseille, France, SLAM 2014was the second edition of the workshop, held in Malaysiaas a satellite of Interspeech 2014. The workshop wasorganized over two days, one for science and onefor socializing and community building. With about15 papers and 30 attendees, the highly-risky secondedition of the workshop showed the will to build astrong scientific community at the frontier of speechand audio processing, natural language processing andmultimedia content processing.

The first day featured talks covering various topicsrelated to speech, language and audio processingapplied to multimedia data. Two keynotes from ShriNarayanan (University of Southern California) and Min-Yen Kan (National University of Singapore) nicelycompleted the program.The second day took us on a tour of Penangfollowed by a visit of the campus of Universiti SainsMalaysia from which local organizers are. The touroffered plenty of opportunities to strengthen the linksbetween participants and build a stronger community, asexpected. Most participants later went ot Singapore toattend Interspeech, the main conference in the domainof speech communication, where further discussionswent on.

We hope to collocate the next SLAM edition with amultimedia conference such as ACM Multimedia in2015. Keep posted!

Workshop chairs: Tan Tien Ping, Wong Li Pei, GuillaumeGravierReport by Guillaume GravierWorkshop URL: http://language.cs.usm.my/SLAM2014

Location: Penang, MalaysiaDates: 11-12. September 2014

Launching the first-everNational Data ScienceBowl

What is the National DataScience Bowl ?

Take a deep dive and see how tiny plants andanimals fuel the world

We are pioneering a new language to understand ourincredibly beautiful and complex world. A language thatis forward-looking rather than retrospective, differentfrom the words of historians and famed novelists. It isdata science; and through it, we have the power to useinsights from our past to build an unprecedented future.We need your help building that future. The 2014/2015National Data Science Bowl offers tremendous potentialto modernize the way we understand and address amajor environmental challenge— monitoring the healthof our oceans.

Compete

ACM is a partner in the first-ever National Data ScienceBowl, which launched on 12/15.

This 90-day competition offers data scientists thechance to solve a critical problem facing our world’soceans using the power of data.

Participants are challenged to examine nearly 100,000underwater images to develop an algorithm that willenable researchers to monitor certain sea life at a speedand scale never before possible.

$175,000 in prize money to top three individualcontestants and the top academic team.

More information at http://www.datasciencebowl.com

.

MPEG Column: Press release for the 109th MPEG meeting

ISSN 1947-4598http://sigmm.org/records 2

ACM SIGMM RecordsVol. 6, No. 4, December 2014

MPEG Column: Pressrelease for the 109thMPEG meeting

MPEG collaborates with SC24 experts to developcommittee draft of MAR reference model

SC 29/WG 11 (MPEG) is pleased to announce that theMixed and Augmented Reality Reference Model (MARRM), developed jointly and in close collaboration withSC 24/WG 9, has reached Committee Draft status atthe 109th WG 11 meeting. The MAR RM defines notonly the main concepts and terms of MAR, but also itsapplication domain and an overall system architecturethat can be applied to all MAR systems, regardlessof the particular algorithms, implementation methods,computational platforms, display systems, and sensors/devices used. The MAR RM can therefore be usedas a consultation source to aid in the developmentof MAR applications or services, business models, ornew (or extensions to existing) standards. It identifiesrepresentative system classes and use cases withrespect to the defined architecture, but does not specifytechnologies for the encoding of MAR information, orinterchange formats.

2nd edition of HEVC includes scalable and multi-view video coding

At the 109th MPEG meeting, the standard developmentwork was completed for two important extensions tothe High Efficiency Video Coding standard (ISO/IEC23008-2, also standardized by ITU-T as Rec. H.265).The first of these are the scalability extensionsof HEVC, known as SHVC, adding support forembedded bitstream scalability in which different levelsof encoding quality are efficiently supported by addingor removing layered subsets of encoded data. Theother are the multiview extensions of HEVC, knownas MV-HEVC providing efficient representation of videocontent with multiple camera views and optional depthmap information, such as for 3D stereoscopic andautostereoscopic video applications. MV-HEVC is the3D video extension of HEVC, and further work for moreefficient coding of 3D video is ongoing.SHVC and MV-HEVC will be combined with the originalcontent of the HEVC standard and also the recently-completed format range extensions (known as RExt), sothat a new edition of the standard will be published thatcontains all extensions approved up to this time.

In addition, the finalization of reference software anda conformance test set for HEVC was completed atthe 109th meeting, as ISO/IEC 23008-5 and ISO/IEC

23008-8, respectively. These important standards willgreatly help industry achieve effective interoperabilitybetween products using HEVC and provide valuableinformation to ease the development of such products.In consideration of the recent dramatic developmentsin video coding technology, including the completion ofthe development of the HEVC standard and severalmajor extensions, MPEG plans to host a brainstormingevent during its 110th meeting which will be open to thepublic. The event will be co-hosted by MPEG’s frequentcollaboration partner in video coding standardizationwork, the Video Coding Experts Group (VCEG) ofITU-T Study Group 16. More information on howto register for the event will be available at http://mpeg.chiariglione.org/meetings/110.

MPEG-H 3D Audio extended to lower bit rates

At its 109th meeting, MPEG has selected technologyfor Version II of the MPEG-H 3D Audio standard (ISO/IEC 23008-3) based on responses submitted to the Callfor Proposals issued in January 2013. This follows fromselection of Version I technology, which was chosenat the 105th meeting, in August 2013. While Version Itechnology was evaluated for bitrates between 1.2 Mb/sto 256 kb/s, Version II technology is focused on bitratesbetween 128 kb/s to 48 kb/s.The selected technology supports content in multipleformats: channel-based, channels and objects (C+O),and scene-based Higher Order Ambisonics (HOA). Atotal of six submissions were reviewed: three for codingC+O content and three for coding HOA content.The selected technologies for Version II were shownto be within the framework of the unified Version Itechnology.The submissions were evaluated using acomprehensive set of subjective listening tests in whichthe resulting statistical analysis guided the selectionprocess. At the highest bitrate of 128 kb/s for the codingof a signal supporting a 22.2 loudspeaker configuration,both of the selected technologies had performance of“Good” on the MUSHRA subjective quality scale. It isexpected that the C+O and HOA Version II technologieswill be merged into a unified architecture.This MPEG-H 3D Audio Version II is expected to reachDraft International Standard by June 2015.

The 109th meeting also saw the technical completionof Version I of the MPEG-H 3D Audio standard and isexpected to be an International Standard by February,2015.

Public seminar for media synchronization plannedfor 110th MPEG meeting in October

A public seminar on Media Synchronization for HybridDelivery will be held on the 22nd of October 2014

.

MPEG Column: Press release for the 109th MPEG meeting

ACM SIGMM RecordsVol. 6, No. 4, December 2014 3

ISSN 1947-4598http://sigmm.org/records

during the 110th MPEG meeting in Strasbourg. Thepurpose of this seminar is to introduce MPEG’s activityon media stream synchronization for heterogeneousdelivery environments, including hybrid environmentsemploying both broadcast and broadband networks,with existing MPEG systems technologies such asMPEG-2 TS, DASH, and MMT. The seminar will alsostrive to ensure alignment of its present and futureprojects with users and industry use-cases needs. Maintopics covered by the seminar interventions include:

• Hybrid Broadcast – Broadband distribution for UHDdeployments and 2nd screen content

• Inter Destination Media Synchronization

• MPEG Standardization efforts on Time LineAlignment of media contents

• Audio Fingerprint based Synchronization

You are invited to join the seminar to learn more aboutMPEG activities in this area and to work with us to furtherdevelop technologies and standards supporting newapplications of rich and heterogeneous media delivery.The seminar is open to the public and registration is freeof charge.

First MMT Developers’ Day held at MPEG 109,second planned for MPEG 110

Following the recent finalization of the MPEG MediaTransport standard (ISO/IEC 23008-1), MPEG hashosted an MMT Developers’ Day to better understandthe rate of MMT adoption and to provide a channel forMPEG to receive comments from industries about thestandard. During the event four oral presentations havebeen presented including “Multimedia transportationtechnology and status in China”, “MMT deliveryconsidering bandwidth utilization”, “Fast channelchange/ Targeted Advertisement insertion over hybridmedia delivery”, and “MPU Generator.” In addition,seven demonstrations have been presented suchas Reliable 4K HEVC Realtime Transmission byusing MMT-FEC, MMT Analyzer, Applications of MMTcontent through Broadcast, Storage, and NetworkDelivery, Media Delivery Optimization with the MMTCache Middle Box, MMT-based Transport Technologyfor Advanced Services in Super Hi-Vision, targetad insertion and multi-view content composition inbroadcasting system with MMT, and QoS managementfor Media Delivery. MPEG is planning to host a 2ndMMT Developer’s Day during the 110th meeting onWednesday, Oct 22nd.

Seminar at MPEG 109 introduces MPEG’s activityfor Free Viewpoint Television

A seminar for FTV (Free Viewpoint Television) was heldduring the 109th MPEG meeting in Sapporo. FTV is an

emerging visual media technology that will revolutionizethe viewing of 3D scenes to facilitate a more immersiveexperience by allowing users to freely navigate theview of a 3D scene as if they were actually there.The purpose of the seminar was to introduce MPEG’sactivity on FTV to interested parties and to align futureMPEG standardization of FTV technologies with userand industry needs.

Digging Deeper – How to Contact MPEG

Communicating the large and sometimes complex arrayof technology that the MPEG Committee has developedis not a simple task. Experts, past and present, havecontributed a series of tutorials and vision documentsthat explain each of these standards individually. Therepository is growing with each meeting, so if somethingyou are interested is not yet there, it may appearshortly – but you should also not hesitate to requestit. You can start your MPEG adventure at http://mpeg.chiariglione.org/

Further Information

Future MPEG meetings are planned as follows:

• No. 110, Strasbourg, FR, 20 – 24 October 2014

• No. 111, Geneva, CH, 16 – 20 February 2015

• No. 112, Warsaw, PL, 22 – 26 June 2015

For further information about MPEG, please contact:

Dr. Leonardo Chiariglione (Convenor of MPEG, Italy)Via Borgionera, 10310040 Villar Dora (TO), ItalyTel: +39 011 935 04 [email protected]

or

Dr. Arianne T. HindsCable Television Laboratories858 Coal Creek CircleLouisville, Colorado 80027 USATel: +1 303 661 [email protected].

The MPEG homepage also has links to other MPEGpages that are maintained by the MPEG subgroups. Italso contains links to public documents that are freelyavailable for download by those who are not MPEGmembers. Journalists that wish to receive MPEG PressReleases by email s

Source: Convenor of MPEG

.

openSMILE:) The Munich Open-Source Large-scale Multimedia Feature Extractor

ISSN 1947-4598http://sigmm.org/records 4

ACM SIGMM RecordsVol. 6, No. 4, December 2014

openSMILE:) TheMunich Open-SourceLarge-scale MultimediaFeature Extractor

A tutorial for version 2.1

Introduction

The openSMILE feature extraction and audio analysistool enables you to extract large audio (and recently alsovideo) feature spaces incrementally and fast, and applymachine learning methods to classify and analyze yourdata in real-time. It combines acoustic features fromMusic Information Retrieval and Speech Processing, aswell as basic computer vision features. Large, standardacoustic feature sets are included and usable out-of-the-box to ensure comparable standards in featureextraction in related research.

The purpose of this article is to briefly introduceopenSMILE, it’s features, potentials, and intended use-cases as well as to give a hands-on tutorial packed withexamples that should get you started quickly with usingopenSMILE.

About openSMILE

SMILE is originally an acronym for Speech & Music

Interpretation by Large-space feature Extraction. Due tothe recent addition of video-processing in version 2.0,the acronym openSMILE evolved to open-Source Media

Interpretation by Large-space feature Extraction.

The development of the toolkit has been started atTechnische Universität München (TUM) for the EU-FP7research project SEMAINE. The original primary focuswas on state-of-the-art acoustic emotion recognition foremotionally aware, interactive virtual agents. After theproject, openSMILE has been continuously extended toa universal audio analysis toolkit. It has been used andevaluated extensively in the series of INTERSPEECHchallenges on emotion, paralinguistics, and speakerstates and traits: From the first INTERSPEECH 2009Emotion Challenge up to the upcoming Challenge atINTERSPEECH 2015 (see openaudio.eu for a summaryof the challenges). Since 2013 the code-base has beentransferred to audEERING and the development iscontinued by them under a dual-license model – keepingopenSMILE free for the research community.

openSMILE is written in C++ and is available as botha standalone command-line executable as well as adynamic library. The main features of openSMILE areits capability of on-line incremental processing and itsmodularity. Feature extractor components can be freelyinterconnected to create new and custom features,all via a simple text-based configuration file. Newcomponents can be added to openSMILE via an easybinary plug-in interface and an extensive internal API.Scriptable batch feature extraction is supported just aswell as live on-line extraction from live recorded audiostreams. This enables you to build and design systemson off-line databases, and then use exactly the samecode to run your developed system in an interactive on-line prototype or even product.

openSMILE is intended as a toolkit for researchers anddevelopers, but not for end-users. It thus cannot beconfigured through a Graphical User Interface (GUI).However, it is a fast, scalable, and highly flexiblecommand-line backend application, on which severalfront-end applications could be based. Such examplesare network interface components, and in the latestrelease of openSMILE (version 2.1) a batch featureextraction GUI for Windows platforms:

As seen in the above figure, the GUI allows to easilychoose a configuration file, the desired output files andformats, and to select files and folders on which to runthe analysis.

Made popular in the field of speech emotion recognitionand paralinguistic speech analysis, openSMILE is nowbeeing widely used in this community. Accordingto google scholar the two papers on openSMILE([Eyben10] and [Eyben13a]) are currently cited over380 times. Research teams across the globe are usingit for several tasks, including paralinguistic speechanalysis, such as alcohol intoxication detection, inVoiceXML telephony-based spoken dialogue systems— as implemented by the HALEF framework, natural,speech enabled virtual agent systems, and humanbehavioural signal processing, to name only a fewexamples.

Key Features

The key features of openSMILE are:

.

openSMILE:) The Munich Open-Source Large-scale Multimedia Feature Extractor

ACM SIGMM RecordsVol. 6, No. 4, December 2014 5

ISSN 1947-4598http://sigmm.org/records

• It is cross-platform (Windows, Linux, Mac, new in2.1: Android)

• It offers both incremental processing and batchprocessing.

• It efficiently extracts a large number of features veryfast by re-using already computed values.

• It has multi-threading support for parallel featureextraction and classification.

• It is extensible with new custom components andplug-ins.

• It supports audio file in- and output as well as livesound recording and playback.

• The computation of MFCC, PLP, (log-)energy, anddelta regression coefficients is fully HTK compatible.

• It has a wide range of general audio signalprocessingcomponents:

• Windowing functions (Hamming, Hann, Gauss,Sine, …),

• Fast-Fourier Transform,

• Pre-emphasis filter,

• Finit-Impulse-Response (FIR) filterbanks,

• Autocorrelation,

• Cepstrum,

• Overlap-add re-synthesis,

• … and speech-related acoustic descriptors:

• Signal energy,

• Loudness based on a simplified sub-band auditorymodel,

• Mel-/Bark-/Octave-scale spectra,

• MFCC and PLP-CC,

• Pitch (ACF and SHS algorithms and Viterbismoothing),

• Voice quality (Jitter, Shimmer, HNR),

• Linear Predictive Coding (LPC),

• Line Spectral Pairs (LSP),

• Formants,

• Spectral shape descriptors (Roll-off, slope, etc.),

• … and music-related descriptors:

• Pitch classes (semitone spectrum),

• CHROMA and CENS features.

• It supports multi-modal fusion on the feature levelthrough openCV integration.

• Several post-processingmethods for low-leveldescriptors are included:

• Moving average smoothing,

• Moving average mean subtraction and variancenormalization (e.g. for on-line Cepstral meansubtraction),

• On-line histogram equalization (experimental),

• Delta regression coefficients of arbitrary order,

• Binary operations to re-combine descriptors.

• A wide range of statistical functionalsfor featuresummarization is supported, e.g.:

• Means, Extremes,

• Moments,

• Segment statistics,

• Sample-values,

• Peak statistics,

• Linear and quadratic regression,

• Percentiles,

• Durations,

• Onsets,

• DCT coefficients,

• Zero-crossings.

• Generic and popular data file formatsare supported:

• Hidden Markov Toolkit (HTK) parameter files (read/write)

• WEKA Arff files (currently only non-sparse) (read/write)

• Comma separated value (CSV) text (read/write)

• LibSVM feature file format (write)

In the latest release (2.1) the new features are:

• Integration and improvement of the emotionrecognition models from openEAR,

• LSTM-RNN based voice-activity detectorprototype models included,

• Fast linear SVMsink component which supportslinear kernel SVM models trained with the WEKASMO classifier,

• LSTM-RNN JSON network file support for networkstrained with the CURRENNT toolkit,

• Spectral harmonics descriptors,

• Android support,

• Improvements to configuration files and command-line options,

• Improvements and fixes.

openSMILE’s architecture

openSMILE has a very modular architecture, designedfor incremental data-flow.A central dataMemory component hosts shared memorybuffers (known as dataMemory levels) to which a singlecomponent can write data and one or more othercomponents can read data from. There are data-sourcecomponents, which read data from files or other externalsources and introduce them to the dataMemory. Thenthere are data-processor components, which read data,modify them, and save it to a new buffer – these are theactual feature extractor components. In the end data-

.

openSMILE:) The Munich Open-Source Large-scale Multimedia Feature Extractor

ISSN 1947-4598http://sigmm.org/records 6

ACM SIGMM RecordsVol. 6, No. 4, December 2014

sink components read the final data and save them tofiles or digest it in other ways (classifiers etc.):

As all components which process data and connectto the dataMemory share some common functionality,they are all derived from a single base classcSmileComponent. The following figure shows theclass hierarchy, and the connections between thecDataWriter and cDataReader components to thedataMemory (dotted lines).

Getting openSMILE and the documentation

The latest openSMILE packages can be downloadedhere.At the time of writing the most recent release is2.1. Grab the complete package of the latest release.This includes the source code, the binaries for Linuxand Windows. Some most up-to-date releases mightnot always include a full-blown set of binaries for allplatforms, so sometimes you might have to compile fromsource, if you want the latest cutting-edge version.

While the tutorial in the next section should give you agood quick-start, it does not and can not cover everydetail of openSMILE. For learning more and gettingfurther help, there are three main resources:The first is the openSMILE documentation, calledthe openSMILE book. It contains detailed instructionson how to install, compile, and use openSMILE andintroduces you to the basics of openSMILE. However,it might not be the most up-to-date resource for thenewest features. Thus, the second resource, is the on-line help built into the binaries. This provides the mostup-to-date documentation of available components andtheir options and features. We will tell you how to use the

on-line help in the next section. If you cannot find youranswer in neither of these resources, you can ask forhelp in the discussion forums on the openSMILE websiteor read the source-code.

Quick-start tutorial

You can’t wait to get openSMILE and try it out on yourown data? Then this is your section. In the following thebasic concepts of openSMILE are described, pre-builtuse-cases of automatic, on-line voice activity detectionand speech emotion recognition are presented, andthe concept of configuration files and the data-flowarchitecture are explained.

a. Basic concepts

Please refer to the openSMILE book for detailedinstallation and compilation instructions. Here weassume that you have a compiled SMILExtract binary(optionally with PortAudio support, if you want to use thelive audio recording examples below), with which youcan run:

to see general usage instructions (first line) and the on-line help for the cWaveSource component (second line),for example.

However, from this on-line help it is hard to get a generalpicture of the openSMILE concepts. We thus describebriefly how to use openSMILE for the most commontasks.

Very loosely said, the SMILExtract binaries can be seenas a special kind of code interpreter which executescustom configuration scripts. What openSMILE actuallydoes in the end when you invoke it is only controlled bythis configuration script. So, in order to do somethingwith openSMILE you need:

• The binary SMILExtract,

• a (set of) configuration file(s),

• and optionally other files, such as classificationmodels, etc.

The configuration file defines all the components that areto be used as well as their data-flow interconnections. Allthe components are iteratively run in the “tick-loop“, i.e. arun method (tick()) of each component is called in everyloop iteration. Each component then checks if there arenew data to process, and if yes, processes the data, andmakes them available for other components to processthem further. Every component returns a status value,

.

openSMILE:) The Munich Open-Source Large-scale Multimedia Feature Extractor

ACM SIGMM RecordsVol. 6, No. 4, December 2014 7

ISSN 1947-4598http://sigmm.org/records

which indicates whether the component has processeddata or not. If no component has had any further data toprocess, the end of the data input (EOI) is assumed. Allcomponents are switched to an EOI state and the tick-loop is executed again to process data which requirespecial attention at the end of the input, such as delta-regression coefficients. Since version 2.0-rc1, multi-pass processing is supported, i.e. providing a featureto enable re-running of the whole processing. It is notencouraged to use this, since it breaks incrementalprocessing, but for some experiments it might benecessary.

The minimal, generic use-case scenario for openSMILEis thus as follows:

Each configuration file can define additional command-line options. Most prominent examples are the optionsfor in- and output files (-I and -O). These optionsare not shown when the normal help is invoked withthe -h option. To show the options defined by aconfiguration file, use this command-line:

The default command-line for processing audio filesfor feature extraction is:

This runs SMILExtract with the configuration given inmy_configfile.conf.

The following two sections will show you how toquickly get some advanced applications running as pre-configured use-cases for voice activity detection andspeech emotion recognition.

b. Use-case: The openSMILE voice-activity detector

The latest openSMILE release (2.1) contains a researchprototype of an intelligent, data-drive voice-activitydetector (VAD) based on Long Short-Term MemoryRecurrent Neural Networks (LSTM-RNN), similar to thesystem introduced in [Eyben13b].

The VAD examples are contained in the folder scripts/vad. A README in that folder describes furtherdetails. Here we give a brief tutorial on how to use thetwo included use-case examples:

• vad_opensource.conf: Runs the LSTM-RNN VADand dumps the activations (voice probability) for eachframe to a CSV text file. To run the example on a wavefile, type:

This will write the VAD probabilities scaled to therange -1 to +1 (2nd column) and the correspondingtimestamps (1st column) to vad.csv. A VADprobability greater 0 indicates voice presence.

• vad_segmeter.conf: Runs the VAD on an input wavefile, and automatically extract voice segments to newwave files. Optionally the raw voicing probabilities asin the above example can be saved to file. To run theexample on a wave file, type:

This will create a new wave file (numberedconsecutively, starting at 1). The vad_segmenter.confoptionally supports output to CSV with the -csvoutput

filename option. The start and end times (in seconds)of the voice segments relative to the start of theinput file can be optionally dumped with the -

saveSegmentTimes filename option. The columns ofthe output file are: segment filename, start (sec.), end(sec.), length of segment as number of raw (10ms)frames.

To visualise the VAD output over the waveform, werecommend using Sonic-visualiser. If you have sonc-visualiser installed (on Linux) you can open both thewave-file and the VAD output with this command:

An annotation layer import dialog should appear. Thefirst column should be detected as Time and the secondcolumn as value. If this is not the case, select thesevalues manually, and specify that timing is specifiedexplicitly (should be the default) and click OK. Youshould see something like this:

.

openSMILE:) The Munich Open-Source Large-scale Multimedia Feature Extractor

ISSN 1947-4598http://sigmm.org/records 8

ACM SIGMM RecordsVol. 6, No. 4, December 2014

c. Use-case: Automatic speechemotion recognition

As of version 2.1, openSMILE supports runningthe emotion recognition models from the openEARtoolkit [Eyben09] in live emotion recognition demo. Inorder to start this live speech emotion recognitiondemo, download the speech emotion recognitionmodels and unzip them in the top-level folder of theopenSMILE package. A folder named models should becreated there which contains a README.txt, and a sub-folder emo. If this is the case, you are ready to run thedemo. Type:

to run it. The classification output will be shown on theconsole.

NOTE: This example requires that you are running

a binary with PortAudio support enabled. Refer to

the openSMILE book for details on how to compile

your binary with portaudio support for Linux. For

Windows pre-compiled binaries (SMILExtractPA*.exe)

are included, which should be used instead of the

standard SMILExtract.exe for the above example.

If you want to choose a different audio recording device,use

To see a list of available devices and their IDs, type:

Note: If you have a different directory layout or have

installed SMILExtract in a system path, you must make

sure that the models are located in a directory named

“models” located in the directory from where you call the

binary, or you must adapt the path to the models in the

configuration file (emobase_live4.conf).

In openSMILE 2.1, the emotion recognition modelscan also be used for off-line/batch analysis. Twoconfiguration files are provided for thispurpose: config/emobase_live4_batch.conf and config/

emobase_live4_batch_single.conf.The latter of the two will compute a single feature vectorfor the input file and return a single result. Use this, ifyour audio files are already chunked into short phrasesor sentences. The first, emobase_live4_batch.conf willrun an energy based segementation on the input andwill return a result for every segment. Use this for longer,un-cut audio files. To run analyis in batch mode, type:

This will redirect the result(s) from SMILExtract’sstandard output (console) to the file result.txt. The fileis by default in a machine parseable format, wherekey=value tokens are separated by :: and a single resultis given on each line, for example:

The above example is the result of the analysis of thefile example-audio/media-interpretation.wav.

d. Understanding configurationfiles

The above, pre-configured examples are a good quick-start to show the diverse potential of the tool. We will nowtake a deeper look at openSMILE configuration files.First, we will use simple, small configuration files, andmodify these in order to understand the basic conceptsof these files. Then, we will show you how to write yourown configuration files from scratch.

The demo files used in this section are provided inthe 2.1 release package in the folder config/demo. Wewill first start with demo1_energy.conf. This file extractsbasic frame-wise logarithmic energy. To run this fileon one of the included audio examples in the folderexample-audio, type the following command:

openSMILE

.

openSMILE:) The Munich Open-Source Large-scale Multimedia Feature Extractor

ACM SIGMM RecordsVol. 6, No. 4, December 2014 9

ISSN 1947-4598http://sigmm.org/records

This will create a file called energy.csv. Its contentshould look similar to this:

The second example we will discuss here, is the audiorecorder example (audiorecorder.conf).

NOTE: This example requires that you are running

a binary with PortAudio support enabled. Refer to

the openSMILE book for details on how to compile

your binary with portaudio support for Linux. For

Windows pre-compiled binaries (SMILExtractPA*.exe)

are included, which should be used instead of the

standard SMILExtract.exe for the following example.

This example implements a simple live audio recorder.Audio is recorded from the default audio device to anuncompressed PCM wave file. To run the example andrecord to rec.wav, type:

Modifiying existing configuration files is the fasted way tocreate custom extraction scripts. We will now change thedemo1_energy.conf file to extract Root-Mean-Square(RMS) energy instead of logarithmic energy. This canbe achieved by changing the respective options in thesection of the cEnergy component (identified by thesection heading [energy:cEnergy]) from

to

As a second example, we will merge audiorecorder.conf

and demo1_energy.conf to create a configuration file

which computes the frame-wise RMS energy from liveaudio input. First, we start with concatenating the twofiles. On Linux, type:

On Windows, use a text editor such as Notepad++ tocombine the files via copy and paste. Now we mustremove the cWaveSource component from the originaldemo1_energy.conf, as this should be replaced by thecPortaudioSource component of the audiorecorder.conf

file. To do this, we search for the line

and comment it out by prefixing it with a ; or the C-style // or the script- and INI-style #. We also remove thecorresponding configuration file section for waveSource.We do the same for the waveSink component andthe corresponding section, the leave only the outputof the computed frame-wise energy to a CSV file.Theoretically, we could also leave the waveSink sectionand component, but we would need to change thecommand-line option defined for the output filename,as this is the same for the CSV output and the wave-file output without any changes. In this case we shouldreplace the filename option in the waveSink section by:

Now, run your new configuration file with:

and inspect the contents of the live_energy.csv file witha text editor.

openSMILE configuration files are made up of sections,similar to INI files. Each section is identified by a headerwhich takes the form:

The first part (instancename) is a custom-chosen namefor the section. It must be unique throughout thewhole configuration file and all included sub-files. Thesecond part defines the type of this configuration sectionand thereby its allowed contents. The configurationsection typename must be one of the availablecomponent names (from the list printed by the command

.

openSMILE:) The Munich Open-Source Large-scale Multimedia Feature Extractor

ISSN 1947-4598http://sigmm.org/records 10

ACM SIGMM RecordsVol. 6, No. 4, December 2014

SMILExtract -L), as configuration file sections are linkedto component instances.The contents of each section are lines of key=value

pairs, until the next section header is found. Besidessimple key=value pairs as in INI files, a more advancedstructure is supported by openSMILE. The key can bea hierarchical value build of key1.subkey, for example,or an array such askeyarray[0] and keyarray[1]. On the other side, the valuefield can also denote an array of values, if the values areseparated by a semi-colon (;). Quotes for the values arenot needed and not yet supported, and multi-line valuesare not allowed.Boolean flags are always expressed as numeric valueswith 1 for on or true and 0 for off or false.The keys are referred to as the configuration optionsof the components, i.e. those listed by the on-line help(SMILExtract -H cComponentType).

Since version 2.1, configuration sections can be split intomultiple parts across the configuration file. That is, thesame header (same instancename and typename) mayoccur more than once. In that case all options from alloccurrences will be joint.

There is one configuration section that must always bepresent: that of the component manager:

The component manager is the main instance whichcreates all component instances of the currently loadedconfiguration, makes them read their configurationsettings from the parsed configuration file (through theconfigManager component), and runs the tick-loop, i.e.the loop where data are processed incrementally bycalling each component once to process newly availabledata frames. Each component that shall be includedin the configuration, must be listed in this section,and for each component listed there, a correspondingconfiguration file section with the same instancename

and of the same component type must exist. The onlyexception is the first line, which instantiates the centraldataMemory component. It must be always present inthe instance list, but no configuration file section has tobe supplied for it.

Each component that processes data has a data-reader and/or a data-writer sub-component, which areconfigurable via the reader and writer objects. The onlyoptions of interest to us now in these objects are thedmLevel options. These options configure the data-flowconnections in your configuration file, i.e. they define inwhich order data is processed by the components, or in

other words, which component is connected with whichother component:

Each component that modifies data or creates data (i.e.reading it from external sources etc.), will write its data toa unique dataMemory location (called level). The nameof this location is defined in the configuration file via theoption writer.dmLevel=name_of_evel. The level namesmust be unique and only one single component can writeto each level. Multiple components can, however, readfrom a single level, enabling re-use of already computeddata by multiple components.E.g. we typically have a wave source component whichreads audio data from an uncompressed audio file (seealso the demo1_energy.conf file):

The above reads data from input.wav into thedataMemory level wave.If next we want to chunk the audio data into overlappinganalysis windows of 20ms length at a rate of 10ms, weneed a cFramer component:

The crucial line in the above code is the line which setsthe reader dataMemory level (reader.dmLevel = wave)to the output level of the wave source component –effectively connecting the framer to the wave sourcecomponent.

To create new configuration files from scratch, aconfiguration file template generator is available. We willuse it to create a configuration for computing magnitudespectra via the Fast-Fourier Transform (FFT). Thetemplate file generator requires a list of componentsthat we want to have in the configuration file, so wemust build this list first. In openSMILE most processingsteps are wrapped in individual components to increaseflexibility and re-usability of intermediate data.For our example we thus need the followingcomponents:

• An audio file reader (cWaveSource),

• a component which generates short-time analysisframes (cFramer),

• a component which applies a windowing functionto these frames such as a Hamming window(cWindower),

• a component which performs a FFT (cTranformFFT),

.

openSMILE:) The Munich Open-Source Large-scale Multimedia Feature Extractor

ACM SIGMM RecordsVol. 6, No. 4, December 2014 11

ISSN 1947-4598http://sigmm.org/records

• a component which computes spectral magnitudesfrom the complex FFT result (cFFTmagphase),

• and finally a component which writes the magnitudespectra to a CSV file (cCsvSink).

The generate our configuration file template, we thus run(note, that the component names are case sensitive!):

The switch -cfgFileTemplate enables the templatefile output, and makes -configDflt accept a commaseparated list of component names. If -configDflt isused by itself, it will print only the default configurationsection of a single component (of which the name isgiven as argument to that option). This invocation ofSMILExtract prints the configuration file template to thelog (i.e., standard error and to the (log-)file given by the-logfile option). The switch -l 0 suppresses all other logmessages (by setting the log-level to 0), leaving only theconfiguration file template lines in the specified file.

The file generated by the above command cannot beused as is, yet. We need to update the data-flowconnections first. In our example this is trivial, as onecomponent always reads from the previous one, exceptfor the wave source, which has no reader. We have tochange:

to

The same for the framer, resulting in:

and for the windower:

where we also change the windowing function from thedefault (Hanning) to Hamming,and in the same fashion we go down all the way to thecsvSink component:

The configuration file can now be used with thecommand:

However, if you run the above, you will most likely get anerror message that the file input.wav is not found. Thisis good news, as it first of all means you have configuredthe data-flow correctly. In case you did not, you will geterror messages about missing data memory levels, etc.The missing file problem is due to the hard-coded inputfile name with the option filename = input.wav in thewave source section. If you change this line to filename

= example-audio/opensmile.wav your configuration willrun without errors. It writes the result to a file calledsmileoutput.csv.

To avoid having to change the filenames in theconfiguration file for every input file you want to process,openSMILE provides a very convenient feature: it allowsyou to define command-line options in the configurationfiles. In order to use this feature you replace the value ofthe filename by the command \cm[], e.g. for the input file:

and for the output file:

The syntax of the \cm commandis: [longoptionName(shortOption-1charOnly){defaultvalue}:description for on-line help].

e. Reference feature sets

A major advantage of openSMILE over related featureextraction toolkits is that is comes with several referenceand baseline feature sets which were used for theINTERSPEECH Challenges (2009-2014) on Emotion,Paralinguistics and Speaker States and Traits, as wellas the Audio-Visual Emotion Challenges (AVEC) from2011-2013. All of the INTERSPEECH configuration filesare found under config/ISxx_*.conf.All the INTERSPEECH Challenge configuration filesfollow a common standard regarding the data output

.

openSMILE:) The Munich Open-Source Large-scale Multimedia Feature Extractor

ISSN 1947-4598http://sigmm.org/records 12

ACM SIGMM RecordsVol. 6, No. 4, December 2014

options they define. The default output file option (-O) defines the name of the WEKA ARFF file to whichfunctionals are written. To save the data in CSV formatadditionally, use the option -csvoutput filename. Todisable the default ARFF output, use -O ?.To enable saving of intermediate parameters, frame-wise Low-Level Descriptors (LLD), in CSV format theoption-lldoutput filename can be used. By default, lines areappended to the functions ARFF and CSV files is theyexist, but the LLD files will be overwritten. To change thisbehaviour, the boolean (1/0) options -appendstaticarff1/0, -appendstaticcsv 1/0, and -appendlld 0/1 areprovided.

Besides the Challenge feature sets, openSMILE 2.1is capable of extracting parameters for the GenevaMinimalistic Acoustic Parameter Set (GeMAPS —submitted for publication as [Eyben14], configurationfiles will be available together with publication of thearticle), which is a small set of acoustic paramtersrelevant for affective voice research. It was standardizedand agreed upon by several research teams, includinglinguists, psychologists, and engineers.

Besides these large-scale brute-forced acoustic featuresets, several other configuration files are provided forextracting individual LLD. These include Mel-FrequencyCepstral Coefficients (MFCC*.conf) and PerceptualLinear Predictive Coding Cepstral Coefficients(PLP*.conf), as well as the fundamental frequencyand loudness (prosodyShsViterbiLoudness.conf, orsmileF0.conf for fundamental frequency only).

Conclusion and summary

We have introduced openSMILE version 2.1 in thisarticle and have given a hands-on practical guideon how to use it to extract audio features of out-of-the-box baseline feature sets, as well as customizedacoustic descriptors. It was also shown how to usethe voice activity detector, and pre-trained emotionmodels from the openEAR toolkit for live, incrementalemotion recognition. The openSMILE toolkit featuresa large collection of baseline acoustic feature sets forparalinguistic speech and music analysis and a flexibleand complete framework for audio analysis. In futurework, more efforts will be put in documentation, speed-up of the underlying framework, and the implementationof new, robust acoustic and visual descriptors.

Acknowledgements

This research was supported by an ERCAdvanced Grant in the European Community’s7th Framework Programme under grant agreement

230331-PROPEREMO (Production and perception ofemotion: an affective sciences approach) to KlausScherer and by the National Center of Competence inResearch (NCCR) Affective Sciences financed by theSwiss National Science Foundation (51NF40-104897)and hosted by the University of Geneva.

The research leading to these results has receivedfunding from the European Community’s SeventhFramework Programme under grant agreement No.\338164 (ERC Starting Grant iHEARu).

The authors would like to thank audEERING UG(haftungsbeschränkt) for providing up-to-date pre-release documentation, computational resources, andgreat support in maintaining the free open-sourcereleases.

Papers

[Eyben09] F. Eyben, M. Wöllmer, and B. Schuller,"openEAR - Introducing the Munich Open-SourceEmotion and Affect Recognition Toolkit," in Proceedings3rd International Conference on Affective Computingand Intelligent Interaction and Workshops (ACII 2009),vol. I, Amsterdam, The Netherlands, pp. 576-581,HUMAINE Association, IEEE, September 2009.

[Eyben10] F. Eyben, M. Wöllmer, and B. Schuller,"openSMILE - The Munich Versatile and Fast Open-Source Audio Feature Extractor," in Proceedings ofthe 18th ACM International Conference on Multimedia(ACM'MM 2010), Florence, Italy, pp. 1459-1462, ACM,October 2010.

[Eyben13a] F. Eyben, F. Weninger, F. Groß, and B.Schuller, "Recent Developments in openSMILE, theMunich Open-Source Multimedia Feature Extractor," inProceedings of the 21st ACM International Conferenceon Multimedia (ACM'MM 2013), Barcelona, Spain, pp.835-838, ACM, October 2013.

[Eyben13b] Eyben, F.; Weninger, F.; Squartini, S.;Schuller, B., "Real-life voice activity detection withLSTM Recurrent Neural Networks and an applicationto Hollywood movies," 2013 IEEE InternationalConference on Acoustics, Speech and SignalProcessing (ICASSP), pp. 483-487, 26-31 May 2013.doi: 10.1109/ICASSP.2013.6637694

[Eyben14] F. Eyben et al.: "The Geneva MinimalisticAcoustic Parameter Set (GeMAPS) for VoiceResearch and Affective Computing", submitted to IEEETransactions on Affective Computing. 2015.

Authors: Florian Eyben and Björn SchullerAffiliations: F. Eyben: Technische Universität München,Munich, Germany; B. Schuller: Chair of Complex and

.

Call for Workshop Proposals @ ACM Multimedia 2015

ACM SIGMM RecordsVol. 6, No. 4, December 2014 13

ISSN 1947-4598http://sigmm.org/records

Intelligent Systems (CIS), University of Passau, Passau,Germany and Department of Computing, Imperial CollegeLondon, London, UK.

Call for WorkshopProposals @ ACMMultimedia 2015We invite proposals for Workshops to be held atthe ACM Multimedia 2015 Conference. Acceptedworkshops will take place in conjunction with the mainconference, which is scheduled for October 26-30,2015, in Brisbane, Australia.

We solicit proposals for two different kinds of workshops:regular workshops and data challenge.

Regular Workshops

The regular workshops should offer a forum fordiscussions of broad range of emerging and specializedtopics of interest to the SIG Multimedia community.There are a number of important issues to be consideredwhen generating a workshop proposal:

1. The topic of the proposed workshop should offera perspective distinct from and complementary tothe research themes of the main conference. Wetherefore strongly advise to carefully review thethemes of the main conference (which can be foundhere), when generating a proposal.

2. The SIG Multimedia community expects theworkshop program to be part of the program tonurture and to grow the workshop research themetowards becoming mainstream in the multimediaresearch field and one of the themes of the mainconference in the future.

3. Interdisciplinary theme workshops are stronglyencouraged.

4. Workshops should offer a discussion forumof a different type than that of the mainconference. In particular, they should avoid becoming“mini-conferences” with accompanying keynotepresentations and best paper awards. While formalpresentation of ideas through regular oral sessionsare allowed, we strongly encourage organizers topropose alternate ways to allow participants todiscuss open issues, key methods and importantresearch topics related to the workshop theme.Examples are panels, group brainstorming sessions,mini-tutorials around key ideas and proof of conceptdemonstration sessions.

Data Challenge Workshops

We are also seeking organizers to propose Challenge-Based Workshops. Both academic and corporateorganizers are welcome.

Data Challenge workshops are solicited from bothacademic and corporate organizers. The organizersshould provide a dataset that is exemplar ofthe complexities of current and future multimodal/multimedia problems, and one or more multimodal/multimedia tasks whose performance can be objectivelymeasured. Participants in the challenge will evaluatetheir methods against the challenge data in orderto identify areas of strengths and weakness. Bestperforming participating methods will be presented inthe form of papers and oral/poster presentations at theworkshop.

More information

For details on submitting workshops proposals and theevaluation criteria, please check the following site:

http://www.acmmm.org/2015/call-for-workshop-proposals/

Important dates:

• Proposal Submission: February 10, 2015

• Notification of Acceptance February 27, 2015

Looking forward to receiving many excellentsubmissions!

Alan Hanjalic, Lexing Xie and Svetha VenkateshWorkshops Chairs, ACM Multimedia 2015

MPEG Column: 110thMPEG Meeting– original posts here by Multimedia Communication

blog, Christian Timmerer, AAU/bitmovin

The 110th MPEG meeting was held at the StrasbourgConvention and Conference Centre featuring thefollowing highlights:

.

MPEG Column: 110th MPEG Meeting

ISSN 1947-4598http://sigmm.org/records 14

ACM SIGMM RecordsVol. 6, No. 4, December 2014

• The future of video coding standardization

• Workshop on media synchronization

• Standards at FDIS: Green Metadata and CDVS

• What’s happening in MPEG-DASH?

Additional details about MPEG’s 110th meeting can bealso found here including the official press release andall publicly available documents.

The Future of Video Coding Standardization

MPEG110 hosted a panel discussion about thefuture of video coding standardization. The panel wasorganized jointly by MPEG and ITU-T SG 16#s VCEGfeaturing Roger Bolton (Ericsson), Harald Alvestrand(Google), Zhong Luo (Huawei), Anne Aaron (Netflix),Stéphane Pateux (Orange), Paul Torres (Qualcomm),and JeongHoon Park (Samsung).

As expected, “maximizing compression efficiencyremains a fundamental need” and as usual, MPEGwill study “future application requirements, and theavailability of technology developments to fulfill theserequirements”. Therefore, two Ad-hoc Groups (AhGs)have been established which are open to the public:

• AHG on Future Video Coding Technology [email][subscription]

• AHG on industry needs for Future Video Coding[email][subscription]

The presentations of the brainstorming session onthe future of video coding standardization can befound here.

Workshop on Media Synchronization

MPEG101 also hosted a workshop on mediasynchronization for hybrid delivery (broadband-broadcast) featuring six presentations “to betterunderstand the current state-of-the-art for mediasynchronization and identify further needs of theindustry”.

• An overview of MPEG systems technologies providingadvanced media synchronization, Youngkwon Lim,Samsung

• Hybrid Broadcast – Overview of DVB TM-CompanionScreens and Streams specification, Oskar vanDeventer, TNO

• Hybrid Broadcast-Broadband distribution for newvideo services : a use cases perspective, RaoulMonnier, Thomson Video Networks

• HEVC and Layered HEVC for UHD deployments, YeKui Wang, Qualcomm

• A fingerprinting-based audio synchronizationtechnology, Masayuki Nishiguchi, Sony Corporation

• Media Orchestration from Capture to Consumption,Rob Koenen, TNO

The presentation material is available here. Additionally,MPEG established an AhG on timeline alignment (that’show the project is internally called) to study use casesand solicit contributions on gap analysis and alsotechnical contributions [email][subscription].

Standards at FDIS: Green Metadata and CDVS

My first report on MPEG Compact Descriptors for VisualSearch (CDVS) dates back to July 2011 which providesdetails about the call for proposals. Now, finally, theFDIS has been approved during the 110th MPEGmeeting. CDVS defines a compact image descriptionthat facilitates the comparison and search of picturesthat include similar content, e.g. when showing the sameobjects in different scenes from different viewpoints.The compression of key point descriptors not onlyincreases compactness, but also significantly speedsup, when compared to a raw representation of thesame underlying features, the search and classificationof images within large image databases. Application ofCDVS for real-time object identification, e.g. in computervision and other applications, is envisaged as well.

Another standard reached FDIS status entitled GreenMetadata (first reported in August 2012). This standardspecifies the format of metadata that can be usedto reduce energy consumption from the encoding,decoding, and presentation of media content, whilesimultaneously controlling or avoiding degradation in theQuality of Experience (QoE). Moreover, the metadataspecified in this standard can facilitate a trade-offbetween energy consumption and QoE. MPEG is alsoworking on amendments to the ubiquitous MPEG-2 TSISO/IEC 13818-1 and ISOBMFF ISO/IEC 14496-12 sothat green metadata can be delivered by these formats.

.

Summary of the 5th BAMMF

ACM SIGMM RecordsVol. 6, No. 4, December 2014 15

ISSN 1947-4598http://sigmm.org/records

What’s happening in MPEG-DASH?

MPEG-DASH is in a kind of maintenance modebut still receiving new proposals in the areaof SAND parameters and some core experiments aregoing on. Also, the DASH-IF is working towardsnew interoperability points and test vectors inpreparation of actual deployments. When speakingabout deployments, they are happening, e.g., a 40hlive stream right before Christmas (by bitmovin,a top-100 company that matters most in onlinevideo). Additionally, VideoNext was co-located withCoNEXT’14 targeting scientific presentations aboutthe design, quality and deployment of adaptivevideo streaming. Webex recordings of the talks areavailable here. In terms of standardization, MPEG-DASH is progressing towards the 2nd amendmentincluding spatial relationship description (SRD),generalized URL parameters and other extensions. Inparticular, SRD will enable new use cases which canbe only addressed using MPEG-DASH and the FDIS isscheduled for the next meeting which will be in Geneva,Feb 16-20, 2015. I’ll report on this within my next blogpost, stay tuned..

Christian Timmerer is a researcher, entrepreneur,and teacher on immersive multimedia communication,streaming, adaptation, and Quality of Experience. Heis an Assistant Professor at Alpen-Adria-UniversitätKlagenfurt, Austria and CIO at bitmovin, Austria. Followhim on Twitter at http://twitter.com/timse7 and subscribeto his blog at http://blog.timmerer.com.

Summary of the 5thBAMMF

Bay Area Multimedia Forum(BAMMF)

BAMMF is a Bay Area Multimedia Forum series.Experts from both academia and industry are invitedto exchange ideas and information through talks,tutorials, posters, panel discussions and networkingsessions. Topics of the forum will include emergingareas in vision, audio, touch, speech, text, varioussensors, human computer interaction, natural languageprocessing, machine learning, media-related signalprocessing, communication, and cross-media analysisetc. Talks in the event may cover advancement inalgorithms and development, demonstration of newinventions, product innovation, business opportunities,etc. If you are interested in giving a presentation at theforum, please contact us.

The 5th BAMMF

The 5th BAMMF was held in the George E. PakeAuditorium in Palo Alto, CA, USA on November 20,2014. The slides and videos of the speakers at the forumhave been made available on the BAMMF web page,and we provide here an overview of their talks. Forspeakers’ bios, the slides and videos, please visit theweb page.

Industrial Impact of Deep Learning– From Speech Recognitionto Language and MultimodalProcessing

Li Deng (Deep Learning Technology Center,Microsoft Research, Redmond, USA)

Since 2010, deep neural networks have started makingreal impact in speech recognition industry, buildingupon earlier work on (shallow) neural nets and (deep)graphical models developed by both speech andmachine learning communities. This keynote will firstreflect on the historical path to this transformativesuccess. The role of well-timed academic-industrialcollaboration will be highlighted, so will be the advancesof big data, big compute, and seamless integrationbetween application-domain knowledge of speech andgeneral principles of deep learning. Then, an overviewwill be given on the sweeping achievements of deeplearning in speech recognition since its initial successin 2010 (as well as in image recognition since 2012).Such achievements have resulted in across-the-board,industry-wide deployment of deep learning. The finalpart of the talk will focus on applications of deep learningto large-scale language/text and multimodal processing,a more challenging area where potentially much greaterindustrial impact than in speech and image recognitionis emerging.

.

Summary of the 5th BAMMF

ISSN 1947-4598http://sigmm.org/records 16

ACM SIGMM RecordsVol. 6, No. 4, December 2014

Brewing a Deeper Understandingof Images

Yangqing Jia (Google)

In this talk I will introduce the recent developments inthe image recognition fields from two perspectives: asa researcher and as an engineer. For the first part Iwill describe our recent entry “GoogLeNet” that won theImageNet 2014 challenge, including the motivation ofthe model and knowledge learned from the inceptionof the model. For the second part, I will dive into thepractical details of Caffe, an open-source deep learninglibrary I created at UC Berkeley, and show how onecould utilize the toolkit for a quick start in deep learningas well as integration and deployment in real-worldapplications.

Applied Deep Learning

Ronan Collobert (Facebook)

I am interested in machine learning algorithms whichcan be applied in real-life applications and which

can be trained on “raw data”. Specifically, I preferto trade simple “shallow” algorithms with task-specifichandcrafted features for more complex (“deeper”)algorithms trained on raw features. In that respect, Iwill present several general deep learning architectures,which excels in performance on various NaturalLanguage, Speech and Image Processing tasks. I willlook into specific issues related to each applicationdomain, and will attempt to propose general solutionsfor each use case.

Compositional Language andVisual Understanding

Richard Socher (Stanford)

In this talk, I will describe deep learning algorithms thatlearn representations for language that are useful forsolving a variety of complex language tasks. I will focuson 3 projects:

• Contextual sentiment analysis (e.g. having analgorithm that actually learns what’s positive in thissentence: “The Android phone is better than theIPhone”)

• Question answering to win trivia competitions (likeIBM Watson’s Jeopardy system but with one neuralnetwork)

• Multimodal sentence-image embeddings to findimages that visualize sentences and vice versa (witha fun demo!) All three tasks are solved with a similartype of recursive neural network algorithm.

November meeting is hosted by Guest Organizers

.

NSF/Intel Partnership on Visual and Experiential Computing (VEC)

ACM SIGMM RecordsVol. 6, No. 4, December 2014 17

ISSN 1947-4598http://sigmm.org/records

Jianchao Yang (Adobe) is a research scientistin Imagination Lab at Adobe Research, San Jose,California. He got his M.S. and Ph.D. degrees fromElectrical and Computer Engineering (ECE) Departmentof University of Illinois at Urbana-Champaign (UIUC)in 2011, under supervision of Professor Thomas S.Huang at Beckman Institute. Before that, he received hisBachelor's degree in EEIS Department from Universityof Science and Technology of China (USTC) in 2006.His research interests are in the broad area of computervision, machine learning, and image processing.Specifically, he has extensive experience in the followingresearch areas: image categorization, object recognitionand detection, image retrieval; image and video super-resolution, denoising and deblurring; face recognition andsoft biometrics; sparse coding and sparse representation;unsupervised learning, supervised learning, and deeplearning.

Eugene Bart (PARC) is a member of research staffat PARC, Palo Alto, California. He received his Ph.D.degree from the Weizmann Institute in 2004, underthe supervision of Prof. Shimon Ullman. Prior to that,he received his B.Sc degree in physics and computerscience from the Tel Aviv University. His researchinterests are in machine learning, computer vision, andbiological vision.

NSF/Intel Partnership onVisual and ExperientialComputing (VEC)NSF and Intel initiative aims to foster novel,transformative, multidisciplinary approaches thatpromote research in VEC technologies, taking intoconsideration various challenges present in this field,and strengthen a research community committed toadvancing research and education at the confluence ofVEC technologies, and to transitioning its findings intopractice.

PhD Thesis Summaries

Jason J Quinlan

Efficient delivery of scalable mediastreaming over lossy networks

Supervisor(s) and Committee member(s): CormacSreenan (main supervisor), Ahmed Zahran (supervisor),Gabriel-Miro Muntean (first opponent), Sabin Tabirca(second opponent)URL: http://hdl.handle.net/10468/1756

Recent years have witnessed a rapid growth inthe demand for streaming video over the Internet,exposing challenges in coping with heterogeneousdevice capabilities and varying network throughput.When we couple this rise in streaming with the growingnumber of portable devices (smart phones, tablets,laptops) we see an ever-increasing demand for high-definition videos online while on the move. Wirelessnetworks are inherently characterised by restrictedshared bandwidth and relatively high error loss rates,

.

PhD Thesis Summaries

ISSN 1947-4598http://sigmm.org/records 18

ACM SIGMM RecordsVol. 6, No. 4, December 2014

thus presenting a challenge for the efficient deliveryof high quality video. Additionally, mobile devices cansupport/demand a range of video resolutions andqualities. This demand for mobile streaming highlightsthe need for adaptive video streaming schemes thatcan adjust to available bandwidth and heterogeneity,and can provide us with graceful changes in videoquality, all while respecting our viewing satisfaction.In this context the use of well-known scalable mediastreaming techniques, commonly known as scalablecoding, is an attractive solution and the focus of thisthesis. In this thesis we investigate the transmissionof existing scalable video models over a lossy networkand determine how the variation in viewable quality isaffected by packet loss. This work focuses on leveragingthe benefits of scalable media, while reducing the effectsof data loss on achievable video quality. The overallapproach is focused on the strategic packetisation of theunderlying scalable video and how to best utilise errorresiliency to maximise viewable quality. In particular,we examine the manner in which scalable video ispacketised for transmission over lossy networks andpropose new techniques that reduce the impact ofpacket loss on scalable video by selectively choosinghow to packetise the data and which data to transmit.We also exploit redundancy techniques, such as errorresiliency, to enhance the stream quality by ensuring asmooth play-out with fewer changes in achievable videoquality. The contributions of this thesis are in the creationof new segmentation and encapsulation techniqueswhich increase the viewable quality of existing scalablemodels by fragmenting and re-allocating the videosub-streams based on user requirements, availablebandwidth and variations in loss rates. We offer newpacketisation techniques which reduce the effects ofpacket loss on viewable quality by leveraging theincrease in the number of frames per group of pictures(GOP) and by providing equality of data in everypacket transmitted per GOP. These provide novelmechanisms for packetizing and error resiliency, as wellas providing new applications for existing techniquessuch as Interleaving and Priority Encoded Transmission.We also introduce three new scalable coding models,which offer a balance between transmission cost and theconsistency of viewable quality.

Mobile and Internet Systems LaboratoryURL: http://www.ucc.ie/en/misl/

Lilian Calvet

Structure-from-Motion paradigmsintegrating circular points:application to camera tracking

Supervisor(s) and Committee member(s): Rapporteurs :

BARTOLI Adrien, Professor, Université d’AuvergneHARTLEY Richard, Professor, Australian NationalUniversityOpponents :STURM Peter, Director of Research, INRIA Rhône-AlpesFOFI David, Professor, Université de BourgogneCHARVILLAT Vincent, Professor, ENSEEIHT(supervisor)GURDJOS Pierre, CNRS research engineer, ENSEEIHT(supervisor)URL: https://tel.archives-ouvertes.fr/tel-00981191/

The thesis deals with the problem of 3D reconstructionof a rigid scene from a collection of views acquiredby a digital camera. The problem addressed, referredas the Structure-from-Motion (SfM) problem, consists incomputing the camera motion (including its trajectory)and the 3D characteristics of the scene based on 2Dtrajectories of imaged features through the collection.We propose theoretical foundations to extend some SfMparadigms in order to integrate real as well as compleximaged features as input data, and more especiallyimaged circular points. Circular points of a projectiveplane consist in a complex conjugate point-pair whichis fixed under plane similarity ; thus endowing theplane with an Euclidean structure. We introduce thenotion of circular markers which are planar markersthat allows to compute, without any ambiguity, imagedcircular points of their supporting plane in all views.Aside from providing a very “rich” euclidean information,such features can be matched even if they are arbitrarilypositioned on parallel planes thanks to their invarianceunder plane similarity ; thus increasing their visibilitycompared to natural features. We show how to benefitfrom this geometric property in solving the projectiveSfM problem via a rank-reduction technique, referredto as projective factorization, of the matrix whoseentries are images of real, complex and/or circularfeatures. One of the critical issues in such a SfMparadigm is the self-calibration problem, which consistsin updating a projective reconstruction into an euclideanone. We explain how to use the euclidean information

.

Recently published

ACM SIGMM RecordsVol. 6, No. 4, December 2014 19

ISSN 1947-4598http://sigmm.org/records

provided by imaged circular points in the self-calibrationalgorithm operating in the dual projective space andrelying on linear equations. All these contributions arefinally used in an automatic camera tracking applicationrelying on markers made up of concentric circles (calledCCTags). The problem consists in computing the 3Dcamera motion based on a video sequence. This kindof application is generally used in the cinema or TVindustry to create special effects. The camera trackingproposed in this work in designed in order to providethe best compromise between flexibility of use andaccuracy.

Recently published

ACM TOMM, Volume 11,Issue 2

Editor-in-Chief: Ralf SteinmetzURL: http://dl.acm.org/citation.cfm?id=2716635&picked=proxPublished: December 2014sponsored by ACM SIGMM

The Transactions on Multimedia Computing,Communication and Applications are the SIGMM’s ownTransactions. As a service to Records readers, weprovide direct links to ACM Digital Library for the papersof the latest TOMCCAP issue.

• Ying Zhang, Luming Zhang, Roger Zimmermann:Aesthetics-Guided Summarization from Multiple UserGenerated Videos

• Kiana Calagari, Mohammad Reza Pakravan, ShervinShirmohammadi, Mohamed Hefeeda: ALP: AdaptiveLoss Protection Scheme with Constant Overhead forInteractive Video Applications

• Dongni Ren, Yisheng Xu, S.-H. Gary Chan: Beyond1Mbps Global Overlay Live Streaming: The Case ofProxy Helpers

• Shengsheng Qian, Tianzhu Zhang, ChangshengXu, M. Shamim Hossain: Social Event Classificationvia Boosted Multimodal Supervised Latent DirichletAllocation

• Jun Ye, Kien A. Hua: Octree-Based 3D Logic andComputation of Spatial Relationships in Live VideoQuery Processing

• Yifang Yin, Zhijie Shen, Luming Zhang, RogerZimmermann: Spatial-Temporal Tag Mining forAutomatic Geospatial Video Annotation

• Chih-Wei Lin, Kuan-Wen Chen, Shen-Chi Chen,Cheng-Wu Chen, Yi-Ping Hung: Large-Area,Multilayered, and High-Resolution Visual MonitoringUsing a Dual-Camera System

• Zhengyu Deng, Ming Yan, Jitao Sang, ChangshengXu: Twitter is Faster: Personalized Time-Aware VideoRecommendation from Twitter to YouTube

• Yongtao Hu, Jan Kautz, Yizhou Yu, Wenping Wang:Speaker-Following Video Subtitles

MMTC R-Letter Volume 5,Issue 6

Board Director: Christian TimmererBoard Co-Directors: Weiyi Zhang and Yan ZhangURL: http://committees.comsoc.org/mmc/r-letters/MMTC-RLetter-Dec2014.pdfPublished: December 2014

The objectives of the IEEE MMTC R-Letter are:

• Stimulate research on multimedia communication.

• Encourage researchers to submit papers (R-LetterCFP) to IEEE MMTC sponsored publications andconferences.

• Nominate papers published in IEEE MMTCsponsored publications/conferences for best paperawards.

• Message from the Review Board Directors

• The Potential Gain of Multiuser MIMO for Mobile VideoApplications

• A short review for “Multiuser MIMO Scheduling forMobile Video Applications”(Edited by Koichi Adachi)

• An Improved Method for Adding Depth to 2D Imagesand Movies

• A short review for “Robust Semi-Automatic DepthMap Generation in Unconstrained Images and VideoSequences for 2D to Stereoscopic 3D Conversion”(Edited by Carsten Griwodz)

• Reconstruct the World Across Time

• A short review for “Scene Chronology”(Edited by Jun Zhou)

• Efficient Feature Descriptors Encoding for MobileAugmented Reality

• A short review for “Interframe Coding of FeatureDescriptors for Mobile Augmented Reality”(Edited by Bruno Macchiavello)

• Paper Nomination Policy

• MMTC R-Letter Editorial Board

• Multimedia Communications Technical CommitteeOfficers

MTAP Volume 73, Issue 3

Editor-in-Chief: Borko FurhtURL: http://link.springer.com/journal/11042/73/3/page/1Published: December 2014

.

Recently published

ISSN 1947-4598http://sigmm.org/records 20

ACM SIGMM RecordsVol. 6, No. 4, December 2014

• Bo Wu, Linfeng Xu: Integrating bottom-up and top-down visual stimulus for saliency detection in newsvideo

• Ting Luo, Gangyi Jiang, Xiaodong Wang, MeiYu…: Stereo image watermarking scheme forauthentication with self-recovery capability using inter-view reference sharing

• Marko Horvat, Nikola Bogunovi#, Krešimir#osi#: STIMONT: a core ontology for multimediastimuli description

• Zong Jie Xiang, Qiren Chen, Yuncai Liu: Featurecorrespondence in a non-overlapping camera network

• Md. Abdur Rahman, M. Shamim Hossain…: Context-aware multimedia services modeling: an e-Healthperspective

• Zhengxin Fu, Bin Yu: Optimal pixel expansion ofdeterministic visual cryptography scheme

• Wai-Shing Cho, Kin-Man Lam: Image classificationwithout segmentation using a hybrid pyramid kernel

• Hoshang Kolivand, Zakiah Noh, Mohd ShahrizalSunar: A quadratic spline approximation using detailmulti-layer for soft shadow generation in augmentedreality

• Yurui Xie, Chao Huang, Linfeng Xu: Semanticsuperpixel extraction via a discriminative sparserepresentation

• Junni Zou, Lin Chen: Joint bandwidth allocation, datascheduling and incentives for scalable video streamingover peer-to-peer networks

• Sidra Riaz, Sang-Woong Lee: A robust multimediaauthentication and restoration scheme in digitalphotography

• Lin Tzy Li, Daniel Carlos GuimarãesPedronette…: A rank aggregation framework for videomultimodal geocoding

• Weidong Wang, Tao Nie, Zhipao Tu, XiaohongChen…: A threshold-adaptive film mode detectionmethod in video de-interlacing

• Chih-Chieh Hsiao, Min-Jen Lo, Slo-Li Chu: Demandlook-ahead memory access scheduling for 3D graphicsprocessing units

• Musab S. Al-Hadrusi, Nabil J. Sarhan: A scalabledelivery solution and a pricing model for commercialvideo-on-demand systems with video advertisements

• Haijiang Zhu, Xuan Wang, Jinglin Zhou, XuejingWang: Approximate model of fisheye camera based onthe optical refraction

• Pradeep K. Atrey, Saeed Alharthi, M. AnwarHossain…: Collective control over sensitive video datausing secret sharing

• Xing Li, Tao Zhang, Yan Zhang, WenxiangLi…: Quantitative steganalysis of spatial ±1steganography in JPEG decompressed images

• Xavier Sevillano, Francesc Alías: A one-shot domain-independent robust multimedia clustering methodologybased on hybrid multimodal fusion

• Sherin M. Youssef, Ahmed AbouElFarag…: Adaptive video watermarking integrating afuzzy wavelet-based human visual system perceptualmodel

• Norena Martin-Dorta, Isabel Sanchez-Berriel…: Virtual Blocks: a serious game for spatialability improvement on mobile devices

• E. Wang, W. Yan: iNavigation: an image based indoornavigation system

• Rafael Martín, José M. Martínez: A semi-supervisedsystem for players detection and tracking in multi-camera soccer videos

• Xueping Su, Jinye Peng, Xiaoyi Feng, JunWu…: Cross-modality based celebrity face naming fornews image collections

• Weizhan Zhang, Zhichao Mo, Cheng Chen, QinghuaZheng: CBC: Caching for cloud-based VOD systems

• Zhiyong Su, Lang Zhou, Guangjie Liu, JianshouKong…: Authenticating topological integrity of processplant models through digital watermarking

• Liyang Yu, Qi Han, Xiamu Niu: An improvedcontraction-based method for mesh skeleton extraction

• Suk-Hwan Lee, Won-Joo Hwang, Ki-RyongKwon: Perceptual 3D model hashing using key-dependent shape feature

• Chungsoo Lim, Jae-Hoon Choi, Sang WonNam…: A new television audience measurementframework using smart devices

• Marc Caillet, Cécile Roisin, Jean Carrive: Multimediaapplications for playing with digitized theaterperformances

• Belgacem Ben Youssef: A visualization tool of 3-Dtime-varying data for the simulation of tissue growth

• Mikel Labayen, Igor G. Olaizola, NaiaraAginako…: Accurate ball trajectory tracking and 3Dvisualization for computer-assisted sports broadcast

• Ali Abdullah Yahya, Jieqing Tan, Min Hu: A blendingmethod based on partial differential equations forimage denoising

• Chunlong Hu, Liyu Gong, Tianjiang Wang,Fang Liu…: An effective head pose estimationapproach using Lie Algebrized Gaussians based facerepresentation

• Yushu Zhang, Di Xiao, Wenying Wen, MingLi: Cryptanalyzing a novel image cipher based onmixed transformed logistic maps

• Ali Al-Haj: A dual transform audio watermarkingalgorithm

• Suk-Hwan Lee, Won-Joo Hwang, Ki-RyongKwon: Polyline curvatures based robust vector datahashing

• K. Seetharaman, M. Kamarasan: Statisticalframework for image retrieval based on multiresolutionfeatures and similarity method

• Yonggang Huang, Heyan Huang, Jun Zhang: Anoisy-smoothing relevance feedback method forcontent-based medical image retrieval

• Carles Ventura, Verónica Vilaplana…: Improvingretrieval accuracy of Hierarchical Cellular Trees forgeneric metric spaces

• Juncong Lin, Qian Sun, Guilin Li, YingHe: SnapBlocks: a snapping interface for assemblingtoy blocks with XBOX Kinect

• Jose M. Saavedra, Benjamin Bustos: Sketch-basedimage retrieval using keyshapes

.

Job Opportunities

ACM SIGMM RecordsVol. 6, No. 4, December 2014 21

ISSN 1947-4598http://sigmm.org/records

• Yanfeng Sun, Huajie Jia, Yongli Hu, BaocaiYin: Color face recognition based on color imagecorrelation similarity discriminant model

• Muhammad Shoaib, Uzair Ahmad, Atif Al-Amri: Multimedia framework to support eHealthapplications

• Chia-Hung Yeh, Wen-Yu Tseng, Chia-Yen Chen,Yu-Dun Lin…: Popular music representation: chorusdetection & emotion recognition

• Min-Jen Tsai, Jin-Shen Yin, Imam Yuadi, JungLiu: Digital forensics of printed source identification forChinese characters

• Zhongmiao Xiao, Xiaojun Qi: Complementaryrelevance feedback-based content-based imageretrieval

• Wei Jiang, Yaowu Chen, Xiang Tian: Fasttranscoding from H.264 to HEVC based on regionfeature analysis

• Reena Friedel, Oscar Figuerola, Hari Kalva…: Assetidentification using image descriptors

• C. Balasubramanian, S. Selvakumar, S.Geetha: High payload image steganography withreduced distortion using octonary pixel pairing scheme

MTAP Volumne 74 Issue 1

Editor-in-Chief: Borko FurhtURL: http://link.springer.com/journal/11042/74/1/page/1Published: January 2015

Special issue on Recent advances in communicationnetworks and multimedia technologies

Guest Editors: Yulei Wu, Peter Mueller Jingguo Ge,Bahman Javadi

• Yulei Wu, Peter Mueller, Jingguo Ge, BahmanJavadi: Editorial: recent advances in communicationnetworks and multimedia technologies

• Zheng Wan, Naixue Xiong, Laurence T.Yang: Cross-layer video transmission over IEEE802.11e multihop networks

• Gaocai Wang, Nao Wang, Xin Yu, TaoshenLi…: Performance analysis of opportunistic schedulingin wireless multimedia and data networks usingstochastic network calculus

• Yun Mao, Jun Peng, Ying Guo, Dazu Huang,Moon Ho Lee: On high-rate full-diversity space-time-frequency code with partial interference cancelationgroup decoding for frequency-selective channels

• Dingding Li, Hai Jin, Xiaofei Liao, Jia Yu: Improvingwrite amplification in a virtualized and multimedia SSDsystem

• Yugen Yi, Baoxue Zhang, Jun Kong, JianzhongWang: An improved locality sensitive discriminantanalysis approach for feature extraction

• Yuxiang Xie, Xiao-Ping Zhang, Xidao Luan, LiLiu…: A novel specific image scenes detection method

• Li Zhang, Wei-Da Zhou, Fan-Zhang Li: Kernel sparserepresentation-based classifier ensemble for facerecognition

• Juncong Lin, Jiazhi Xia, Xing Gao, MinghongLiao…: Interior structure transfer via harmonic 1-forms

• Yu-Teng Jang, Shuchih Ernest Chang, Po-AnChen: Exploring social networking sites for facilitatingmulti-channel retailing

• Sang-Soo Yeo, Ken Chen, Honghai Liu: Patternrecognition technologies for multimedia informationprocessing

• A-Ra Khil, Kang-Hee Lee: Optimization of a robot-served cart capacity using the three-dimensional singlebin packing problem

• Jin Woo Choi, Taeg Keun Whangbo, Cheong GhilKim: A contour tracking method of large motion objectusing optical flow and active contour model

• Jaemin Soh, ByungOk Han, Yeongjae Choi,Youngmin Park…: Automatic registration of a virtualexperience space with Kinect

• Ahra Jo, Gil-Jin Jang, Bohyung Han: Occlusiondetection using horizontally segmented windows forvehicle tracking

• Dongwann Kang, Phutphalla Kong, KyungHyunYoon…: Directional texture transfer for video

• Ji Hun Kang, Shin Jin Kang, SooKyun Kim: Linerecognition algorithm for 3D polygonal model using aparallel computing platform

• Sooyoung Park, Hyunji Chung, ChanghoonLee…: Methodology and implementation for trackingthe file sharers using BitTorrent

• Sanghoon Jun, Seungmin Rho, EenjunHwang: Music structure analysis using self-similaritymatrix and two-stage categorization

• Hyunguk Yoo, Taeshik Shon: Novel Approachfor Detecting Network Anomalies for SubstationAutomation based on IEC 61850

Job Opportunities

PhD Position in LayeredVideo Distribution over ICN

PROJECT TITLE: VidICN: Layered Video Distributionover Information Centric Networks

PROJECT DESCRIPTION:This project will research and develop a solutionto distribute layered video (specifically HEVC) overInformation Centric Networks. Information CentricNetworking (ICN) is a novel network architectureproposed for the future Internet. It is based on theobservation that the major usage of the current Internethas changed from end-to-end communication to datadistribution. According to Cisco’s report, consumerInternet video traffic will be 69 percent of all consumer

.

Job Opportunities

ISSN 1947-4598http://sigmm.org/records 22

ACM SIGMM RecordsVol. 6, No. 4, December 2014

Internet traffic in 2017, up from 57 percent in 2012.Layered video is a promising technology to supportvarious applications, e.g. adaptive video streaming and3D video. The upcoming High Efficiency Video Coding(HEVC) supports 8K Ultra High Definition video (16times as many pixels as current 1080p video) withup to twice the data compression as its predecessorH.264/AVC. It has also been extended to supportScalable Video (SHVC) and Multi-view Video (MV-HEVC). In-network caching is an important feature ofICN to improve distribution performance. The existingresearch concentrates on in-network caching for genericdata traffics, whereas VidICN focuses on how differentcaching schemes affect on the performance of layeredvideo distribution and then designs a routing andcaching scheme for layered video (HEVC) distribution.The key research objectives of the VidICN project are:1) Experimental evaluation of effects of different in-network caching models on the performance of layeredvideo distribution.2) Modelling layered video distribution in various in-network caching scenarios.3) Design a dynamic content routing and cachingmethod for layered video distribution.

Duration of project: Maximum 48 MonthsFunding: Fees and Stipends are covered by ScienceFoundation IrelandMinimum qualifications/experience necessary/any otherrequirements: BSc, BEng with Honours (Grade 2.2);English: Minimum IELTS 6.0 with no Band less than 5.5(for students with a degree from a non-English-speakingcountry); Strong knowledge in computer networking;Good experience in C/C++ and Linux.Main Supervisor: Dr. Yuansong Qiao([email protected])

APPLICATION:Application forms available from Anita Watts, Office ofResearch, Athlone Institute of Technology, Tel: +353 9064 83061 Email: [email protected] or from link DownloadApplication Form on AIT website – Vacancies –Postgraduate Research Opportunities (http://www.ait.ie/research/researchopportunities/).Completed application forms must be submitted toAnita Watts ([email protected]), Office of Research, AthloneInstitute of Technology, Dublin Road, Athlone, Co.Westmeath, Ireland

CLOSING DATE:Please check AIT website (http://www.ait.ie/research/researchopportunities/)

Employer: Software Research Institute, Athlone Instituteof Technology, IrelandExpiration date: Monday, August 31, 2015More information date: http://www.ait.ie/research/researchopportunities/

PhD Studentships (VideoStreaming/SDN) in CorkIreland

PhD Studentships at University College Cork inIreland

Closing Date for Applications: None, positions willremain open until filled. Applications will be reviewed atas soon as they are received.

Project: An Internet Infrastructure for Video StreamingOptimisation (iVID)

The Mobile and Internet Systems Laboratory (MISL)in the Department of Computer Science at UCC is aninternationally recognised research centre focused oninnovative networking research. iVID is a new researchproject funded by Science Foundation Ireland toinvestigate the use of software defined networking(SDN) techniques to optimise the delivery of streamingvideo. A team of 5 project researchers will work oniVID, including 3 Ph.D. students. The project involvesactive collaboration with AT&T, EMC and UC Riverside.

Applications are invited for fixed-term studentships(annual value of €18K, plus fees) from suitably qualifiedcandidates who wish to undertake a PhD within theDepartment of Computer Science. Applicants shouldhave a Masters degree in computer science ora closely related discipline, although applicationsfrom truly exceptional students with a bachelor’sdegree will be considered. Ideally, applicants willhave some project experience in the areas ofvideo streaming, software defined networks, or moregenerally network protocols. Applicants must havestrong mathematical ability and an interest insystems programming and experimental computerscience. Applicants must demonstrate good inter-personal skills, and a high standard of spoken andwritten English. The positions are open to applicants ofany nationality.

How to apply:Applications by email to Mary [email protected] and must include “PhDStudentship iVID” in the subject line. Applicationsmust include, in PDF format only:

1. 300 word personal statement explaining your interestin the project and networking research;2. full CV;3. copy of transcript(s) showing names of all coursestaken and grades achieved;4. summaries of projects (BSc/MSC), internships andrelevant work experience completed.

.

Calls for Contribution

ACM SIGMM RecordsVol. 6, No. 4, December 2014 23

ISSN 1947-4598http://sigmm.org/records

For more information on MISL and the Department ofComputer Science, please see the links below.

http://www.cs.ucc.ie/misl/

http://www.cs.ucc.ie/

Employer: University College Cork, IrelandExpiration date: Monday, August 31, 2015More information date: http://www.cs.ucc.ie/misl

Postdoc on MachineLearning and Applicationsat the Australian NationalUniversity

The Australian National University (ANU) is offeringone postdoc research fellow position with a 2-year appointment funded by the Australian ResearchCouncil.

The research position seeks to push the frontiers in bothmachine learning methods and formulating real-worldproblems with machine learning. The position would suitearly to mid career researchers interested in extendingmachine learning methods such as:

- structured prediction models- inferring graph structure- prediction on time series and graphs

and using these advances to answer questions relatedto

- social media- recommender systems- optimising computer systems- scientific data

in collaboration with domain experts.

The ANU is a highly-ranked research-intensiveuniversity (world-wide rankings – QS: 25th, Times: 48th,Shanghai ARWU 66th). The postdoc research fellowwill be located at the strong ANU AI research group,with frequent interactions with NICTA Machine Learninggroup of 20+ researchers on machine learning anddata science. Career development opportunities includepossible contribution to teaching, the ability to applyfor independent grants with the ARC, inter-diciplinarycollaborations, or work with government/industry.The annual salary starts at AUD 81K+ per year, plusbenefits.

The research will be conducted under the supervisionof Dr Lexing Xie and Dr Cheng Soon Ong, as well

as other Australian and New Zealand Universities.Candidates having a solid machine learning andmathematical background, and a willingness to applymachine learning to large, real-world datasets areencouarged to apply. Interested candidates may write [email protected] a CV by Jan 25, 2015.

Employer: The Australian National UniversityExpiration date: Saturday, January 31, 2015More information date:

Calls for Contribution

CFPs: Sponsored by ACMSIGMM

ICMR 2015

ACM International Conference onMultimedia Retrieval 2015

Submission deadline: 25. January 2015Location: Shanghai, ChinaDates: 23. June 2015 -26. June 2015More information: http://www.icmr2015.org/Sponsored by ACM SIGMM

Effectively and efficiently retrieving information basedon user needs is one of the most exciting areas inmultimedia research. The Annual ACM InternationalConference on Multimedia Retrieval (ICMR) offersa great opportunity for exchanging leading-edgemultimedia retrieval ideas among researchers,practitioners and other potential users of multimediaretrieval systems. This … Read more

Special Sessions @ ACM ICMR2015

Special Sessions @ ACM InternationalConference on Multimedia Retrieval 2015

Submission deadline: 01. December 2014Location: Shanghai, ChinaDates: 23. June 2015 -26. June 2015More information: http://www.icmr2015.org/call_for_special_sessionSponsored by ACM SIGMM

ACM ICMR 2015 will include some Special Sessions forinnovative and frontier topics in the field of multimediaretrieval. Each Special Session will include around 5papers. A proposal for the Special Sessions should

.

Calls for Contribution

ISSN 1947-4598http://sigmm.org/records 24

ACM SIGMM RecordsVol. 6, No. 4, December 2014

include: – Title of proposed session – An introductionstating the importance of the topic and … Read more

Tutorials @ ACM ICMR 2015

Tutorials @ ACM International Conferenceon Multimedia Retrieval 2015

Submission deadline: 26. January 2015Location: Shanghai, ChinaDates: 23. June 2015 -26. June 2015More information: http://www.icmr2015.org/call_for_tutorial_proposalsSponsored by ACM SIGMM

ICMR 2015 will feature free tutorials on the first day(June 23rd, 2015) addressing the broad interests of themultimedia retrieval community. ICMR tutorials aim toprovide a comprehensive overview of specific topics inmultimedia retrieval. The topic should be of sufficientrelevance with respect to the state-of-the-art and the …Read more

Workshops @ ACM ICMR 2015

Workshops @ ACM InternationalConference on Multimedia Retrieval 2015

Submission deadline: 01. December 2014Location: Shanghai, ChinaDates: 23. June 2015 -26. June 2015More information: http://www.icmr2015.org/call_for_workshopSponsored by ACM SIGMM

ACM ICMR 2015 invites active members of thecommunity to submit workshop proposals. The selectedworkshops will focus on topics relevant to multimediaretrieval researchers and practitioners and shouldtrigger lively discussions and interactions amongparticipants. They should differ from the main topicscovered in the ICMR conference, yet attract … Readmore

CFPs: Sponsored by ACM(any SIG)

ACM FCRC 2015

Federated Computing ResearchConference 2015

Submission deadline: 28. February 2015Location: Portland, OR, USADates: 12. June 2015 -20. June 2015More information: http://fcrc.acm.org

Sponsored by ACM

FCRC 2015 assembles a spectrum of affiliatedresearch conferences and workshops into a week longcoordinated meeting held at a common time in acommon place. This model retains the advantages of thesmaller conferences, while at the same time, facilitatescommunication among researchers in different fields incomputer science … Read more

ACM MoVid 2015

7th ACM Workshop on Mobile Video(MoVid 2015)

Submission deadline: 28. November 2014Location: Portland, Oregon, USADates: 20. March 2015 -01. January 1972More information: http://eecs.ucf.edu/movid/Sponsored by ACM

The focus of this workshop is to present anddiscuss recent advances in the broad area of mobilevideo services. Specifically, the workshop intends toaddress the following topics: a) Novel mobile videoapplications and architectures; (b) Research challengesin developing new techniques for providing rich videoexperience on wireless … Read more

C&C ’15:

ACM Creativity and Cognition

Submission deadline: 06. January 2015Location: GlasgowDates: 22. June 2015 -25. June 2015More information: http://cc15.cityofglasgowcollege.ac.uk/Sponsored by ACM

ACM Creativity and Cognition 2015 invites papers,posters, demonstrations workshops and Artworksinvestigating how interactive computing systems andsociotechnical processes affect creativity. We cherishcreativity as a wonderful aspect of human experience,transformative and potentially transcendental. Creativityis the partner of inspiration, of moments when we seemto go beyond … Read more

W4A 2015

International Web for All Conference

Submission deadline: 23. January 2015Location: Florence, ItalyDates: 18. May 2015 -20. January 2015More information: http://www.w4a.infoSponsored by ACM

.

Calls for Contribution

ACM SIGMM RecordsVol. 6, No. 4, December 2014 25

ISSN 1947-4598http://sigmm.org/records

We welcome you to submit your best work on improvingaccessibility of the Web, Mobiles, and Wearables forpeople with and without disabilities. Main highlights: –Intuit will award $2,000 and $1,000 to the best technicaland communication papers – The Paciello Group willaward the winners of the Accessibility … Read more

CFPs: Sponsored by IEEE(any TC)

ICCCN 2015

The 24th International Conference onComputer Communication and Networks

Submission deadline: 26. February 2015Location: Las Vegas, Nevada, USADates: 03. August 2015 -06. August 2015More information: http://www.icccn.org/icccn15/Sponsored by IEEE

ICCCN is one of the leading international conferencesfor presenting novel ideas and fundamental advancesin the fields of computer communications and networks.ICCCN serves to foster communication amongresearchers and practitioners with a common interestin improving computer communications and networkingthrough scientific and technological innovation.

ICETC 2015

The Second International Conference onEducation Technologies and Computers(ICETC2015)

Submission deadline: 15. April 2015Location: University of the Thai Chamber of Commerce,Bangkok, ThailandDates: 20. May 2015 -22. May 2015More information: http://sdiwc.net/conferences/icetc2015/Sponsored by IEEE

The proposed conference that will be held at Universityof the Thai Chamber of Commerce, Bangkok, ThailandFrom May 20-22, 2015 aims to enable researchers buildconnections between different digital applications. Theconference welcomes papers on the following (but notlimited to) research topics: – AV-Communication andMultimedia – Assessment … Read more

IEEE BigMM2015

The First IEEE International Conference onMultimedia Big Data

Submission deadline: 19. December 2014

Location: Beijing, ChinaDates: 20. April 2015 -22. April 2015More information: http://www.BigMM.orgSponsored by IEEE

The IEEE International Conference on MultimediaBig Data (BigMM) is a world’s premier forum ofleading scholars in the highly active multimedia bigdata research, development and applications. The firstBigMM conference, with the theme “Multimedia: TheBiggest Big Data”, will be held in Beijing, China, fromApril 20 to … Read more

IEEE MMCloudCom 2015

IEEE International Workshop onMultimedia Cloud Communication(MMCloudCom 2015)

Submission deadline: 15. January 2015Location: Hong KongDates: 26. April 2015 -01. May 2015More information: https://sites.google.com/site/mmcloudcom2015/Sponsored by IEEE

Papers describing original, previously unpublishedresearch work, experimental efforts, practicalexperiences, and industrial and commercialdevelopments in all aspects of Mobile MultimediaCommunications for Mobile Computing Devices aresolicited. Potential topics include, but are not limitedto the following areas of interest: 1. Multimedia CloudCommunication 2. Mobility in Multimedia … Read more

IEEE T-MM

IEEE Transactions on Multimedia

Special issue on “Multimedia: The BiggestBig Data”

Submission deadline: 28. February 2015Special issueMore information: http://www.signalprocessingsociety.org/tmm/tmm-special-issues/Sponsored by IEEE

Multimedia is increasingly becoming the “biggest bigdata” as the most important and valuable source forinsights and information. It covers from everyone’sexperiences to everything happening in the world.There will be lots of multimedia big data – surveillancevideo, entertainment and social media, medical images,consumer images, voice and video, … Read more

.

Calls for Contribution

ISSN 1947-4598http://sigmm.org/records 26

ACM SIGMM RecordsVol. 6, No. 4, December 2014

IEEE TMM

IEEE Transactions on Multimedia

Special Issue on Deep Learning forMultimedia Computing (e.g. Special issueon Peer-to-Peer Streaming)

Submission deadline: 15. March 2015Special issueMore information: http://www.signalprocessingsociety.org/uploads/email/TMM_SI_deep_learning.htmlSponsored by IEEE

Conventional multimedia computing is often built ontop of handcrafted features, which are often muchrestrictive in capturing complex multimedia content suchas images, audios, text and user-generated data withdomain-specific knowledge. Recent progress on deeplearning opens an exciting new era, placing multimediacomputing on a more rigorous foundation … Read more

IEEE WoWMoM 2015

Sixteenth International Symposium on aWorld of Wireless, Mobile and MultimediaNetworks

Submission deadline: 28. November 2014Location: Boston, MA, USADates: 14. June 2015 -17. June 2015More information: http://csr.bu.edu/wowmom15/Sponsored by IEEE

Abstract submission deadline: November 21, 2014Full manuscript due: November 28, 2014 Acceptancenotification: March 13, 2015 Sponsored by IEEEComputer Society, Missouri University of Science andTechnology, IEEE Computer Society TC on ComputerCommunications (TCCC). All submissions (Regularand Work in Progress papers) must describe originalresearch, not published … Read more

Med-Hoc-Net 2015

14th IFIP Annual Mediterranean Ad HocNetworking Workshop

Submission deadline: 23. February 2015Location: Vilamoura, Algarve, PortugalDates: 17. June 2015 -19. January 2015More information: http://medhocnet2015.uc.pt/Sponsored by IEEE

MED-HOC-NET 2015 focuses on fundamental networkconcepts, but also on topics related to those networkparadigms attracting large attention by the scientific andindustrial communities. Hence, the conference includestopics such as smart cities, smart transportation,IoT, data-centric operations, autonomous organization,seamless connectivity and cross-layer design andapplication scenarios (road … Read more

Workshops at BigMM 2015

IEEE Multimedia Big Data Call forWorkshop Proposals

Submission deadline: 28. November 2014Location: Beijing, ChinaDates: 20. April 2015 -22. April 2015More information: http://www.bigmm2015.org/CallForWorkshopProposals.aspSponsored by IEEE

The IEEE BigMM’15 organizing committee invitesproposals for workshops to be held in conjunctionwith the conference. The workshops aim to explorefocused interest areas and provide international forumsfor researchers and industry practitioners to share theirresearch results and practical development experienceson hot topics of multimedia big data … Read more

CFPs: Not ACM-/IEEE-sponsored

Advanced Crowdsourcingfor Speech and Beyond @INTERSPEECH 2015

Special Session on AdvancedCrowdsourcing for Speech and Beyond

Submission deadline: 20. March 2015Location: Dresden, Germany.Dates: 06. September 2015 -10. September 2015More information: http://interspeech2015.org/events/special-sessions/advanced-crowdsourcing-for-speech-and-beyond/

AIM @ EPIA 2015

Artificial Intelligence in Medicine

Submission deadline: 09. March 2015Location: Coimbra, Portugal,Dates: 08. September 2015 -11. September 2015More information: http://epia2015.dei.uc.pt/artificial-intelligence-in-medicine/

.

Calls for Contribution

ACM SIGMM RecordsVol. 6, No. 4, December 2014 27

ISSN 1947-4598http://sigmm.org/records

CDVE 2015

The 12th International Conference onCooperative Design, Visualization andEngineering

Submission deadline: 01. April 2015Location: Mallorca, SpainDates: 20. September 2015 -23. September 2015More information: http://www.cdve.org

CVIU

Computer Vision Image Understanding

Special Issue on Individual and GroupActivities in Video Event Analysis

Submission deadline: 20. December 2014Special issueMore information: http://www.journals.elsevier.com/computer-vision-and-image-understanding/call-for-papers/special-issue-on-individual-and-group-activities-in-video/

Demos @ WAIM 2015

The 16th International conference on Web-Age Information Management

Submission deadline: 07. February 2015Location: Qingdao, Shandong, ChinaDates: 08. June 2015 -10. June 2015More information: http://www.cs.sdu.edu.cn/waim2015/

DIPECC 2015

The Third International Conferenceon Digital Information Processing, E-Business and Cloud Computing

Submission deadline: 01. January 1972Location: Reduit, Mauritius University of MauritiusDates: 29. June 2015 -01. July 2015More information: http://sdiwc.net/conferences/dipecc2015/

ICCCN 2015

The 24th International Conference onComputer Communication and Networks

Submission deadline: 26. February 2015

Location: Las Vegas, Nevada, USADates: 03. August 2015 -06. August 2015More information: http://www.icccn.org/icccn15/

ICDIPC 2015

International Conference onDigital Information Processing andCommunications

Submission deadline: 01. September 2015Location: Sierre, SwitzerlandDates: 07. October 2015 -09. October 2015More information: http://sdiwc.net/conferences/icdipc2015/

IEEE IWASI 2015

6th International IEEE Workshop onAdvances in Sensors and Interfaces

Submission deadline: 10. April 2015Location: Gallipoli (Le), ItalyDates: 18. June 2015 -19. June 2015More information: http://iwasi2015.poliba.it

IHCI 2014

6th international conference on IntelligentHuman Computer Interaction

Submission deadline: 01. January 1972Location: Evry (near Paris), FranceDates: 08. December 2014 -10. December 2014More information: http://ihci2014.telecom-sudparis.eu/calls

IntelliSys 2015

SAI Intelligent Systems Conference 2015

Submission deadline: 15. April 2015Location: London, UKDates: 10. November 2015 -11. November 2015More information: http://saiconference.com/IntelliSys2015/CallforPapers

INTERSPEECH 2015

Submission deadline: 20. March 2015Location: Dresden, Germany.Dates: 06. September 2015 -10. September 2015

.

Calls for Contribution

ISSN 1947-4598http://sigmm.org/records 28

ACM SIGMM RecordsVol. 6, No. 4, December 2014

More information: http://www.interspeech2015.org

ISSRMET2015

The International Conference onInformation System Security, RiskManagement, and E-CommerceTransactions

Submission deadline: 10. February 2015Location: Dubai, UAEDates: 04. March 2015 -06. February 2015More information: http://sdiwc.net/conferences/issrmet2015/

PAMUR 2015 @ ACM ICMR 2015

International Workshop on Personalityand Affect in Multimedia Retrieval

Submission deadline: 10. February 2015Location: Shanghai, ChinaDates: 23. June 2015 -23. June 2015More information: http://www.icmr2015.org/PAMUR2015

PIS @ WorldCist 2015

Pervasive Information Systems

Submission deadline: 14. December 2014Location: Azores, PortugalDates: 01. April 2015 -03. April 2015More information: http://www.aisti.eu/worldcist15/index.php/workshops/pis

PIS @ WorldCist 2015

Pervasive Information Systems Workshop

Submission deadline: 07. December 2014Location: Ponta Delgada, Azores, PortugalDates: 01. April 2015 -03. April 2015More information: http://www.aisti.eu/worldcist15/index.php/workshops/pis

SAI 2015

Science and Information Conference 2015

Submission deadline: 05. January 2015Location: London, UKDates: 28. July 2015 -30. July 2015More information: http://thesai.org/SAIConference2015

SHREC 2015

Shape Retrieval Contest 2015

Submission deadline: 01. January 1972Location: Zurich, SwitzerlandDates: 02. May 2015 -03. May 2015More information: http://vc.ee.duth.gr/3dor2015/#shrec

Synergies of Speech andMultimedia Technologies @INTERSPEECH 2015

Special Session on Synergies ofSpeech and Multimedia Technologies atINTERSPEECH 2015

Submission deadline: 20. March 2015Location: Dresden, GermanyDates: 06. September 2015 -10. September 2015More information: http://multimediaeval.org/files/Interspeech2015_specialSession_SynergiesOfSpeechAndMultimediaTechnologies.html

Ubicomp 2015

The 2015 ACM International JointConference on Pervasive and UbiquitousComputing

Submission deadline: 02. March 2015Location: Osaka, JapanDates: 07. September 2015 -11. September 2015More information: http://www.ubicomp.orgIn cooperation with ACM

WAIM 2015

International Conference on Web-AgeInformation Management

Submission deadline: 16. January 2015Location: Qingdao, Shandong, ChinaDates: 08. June 2015 -10. June 2015More information: http://www.cs.sdu.edu.cn/waim2015/

WS @ IEEE ICME 2015

Workshop on Multimediafor Cooking andEating Activities

Submission deadline: 30. March 2015Location: Torino, ItalyDates: 03. July 2015 -03. July 2015

.

Back Matter

ACM SIGMM RecordsVol. 6, No. 4, December 2014 29

ISSN 1947-4598http://sigmm.org/records

More information: http://www.mm.media.kyoto-u.ac.jp/CEA2015

WSICC @ ACM TVX 2015

Workshop on Interactive ContentConsumption @ ACM TVX 2015

Submission deadline: 02. March 2015Location: Brussels, BE, EuropeDates: 03. June 2015 -03. June 2015More information: http://wsicc.net/2015/

Back Matter

Notice to ContributingAuthors to SIG Newsletters

By submitting your article for distribution in this SpecialInterest Group publication, you hereby grant to ACM thefollowing non-exclusive, perpetual, worldwide rights:

• to publish in print on condition of acceptance by theeditor

• to digitize and post your article in the electronicversion of this publication

• to include the article in the ACM Digital Library and inany Digital Library related services

• to allow users to copy and distribute the article fornoncommercial, educational or research purposes

However, as a contributing author, you retain copyrightto your article and ACM will refer requests forrepublication directly to you.

Impressum

Editor-in-Chief

Carsten Griwodz, Simula Research Laboratory

Editors

Stephan Kopf, University of MannheimViktor Wendel, Darmstadt University of TechnologyLei Zhang, Microsoft Research AsiaPradeep Atrey, University of WinnipegChristian Timmerer, Klagenfurt UniversityPablo Cesar, CWIMathias Lux, Klagenfurt UniversityHerman Engelbrecht, Stellenbosch UniversityTouradj Ebrahimi, Ecole Polytechnique Federale deLausanne

Mohammad Anwar Hossain, King Saud UniversityMichael Riegler, Simula Research Laboratory

.

30