exploring music collections in virtual landscapes
TRANSCRIPT
A user interface tomusic repositoriescalled nepTunecreates a virtuallandscape for anarbitrary collectionof digital music files,letting users freelynavigate thecollection.Automaticallyextracting featuresfrom the audiosignal and clusteringthe music piecesaccomplish this. Theclustering helpsgenerate a 3D islandlandscape.
The ubiquity of digital music is a char-acteristic of our time. Everyday life isshaped by people wearing earphonesand listening to their personal music
collection in virtually any situation. Indeed, wecan argue that recent technical advancements inaudio coding and the associated enormous successof portable MP3 players, especially Apple’s iPod,have immensely added to forming the zeitgeist.
These developments are profoundly changingthe way people use music. Music is becoming acommodity that is traded electronically, exchanged,shared (legally or not), and even used as a meansfor social communication and display of person-ality (witness the huge number of people puttingtheir favorite tracks on their personal sites, suchas MySpace).
Despite these rapid changes in the way peopleuse music, methods of organizing music collec-tions on computers and music players has basi-cally remained the same. Owners of digital musiccollections traditionally organize their thousandsof audio tracks in hierarchical directories, oftenstructured according to the common scheme:genre then artist then album then track. Indeed,people don’t have much of a choice, given theoptions offered by current music players andcomputers.
The rapidly growing research field of musicinformation retrieval is developing the techno-logical foundations for a new generation of moreintelligent music devices and services. Researchers
are creating algorithms for audio and music analy-sis, studying methods for retrieving music-relatedinformation from the Internet, and investigatingscenarios for using music-related information fornovel types of computer-based music services. Therange of applications for such technologies isbroad—from automatic music recommendationservices through personalized, adaptive radio sta-tions, to novel types of intelligent, reactive musi-cal devices and environments.
At our institute, we are exploring ways of pro-viding new views on the contents of digitalmusic collections and new metaphors for inter-acting with music and collections of musicpieces. At the confluence of artificial intelligence,machine learning, signal processing, data andWeb mining, and multimedia, we develop algo-rithms for analyzing, interpreting, and display-ing music in ways that are interesting, intuitive,and useful to human users. In this article, wedescribe a particular outcome of this research—an interactive, multimodal interface to music col-lections, which we call nepTune.
Our approachThe general philosophy underlying nepTune
is that music collections should be structured(automatically, by the computer) and presentedaccording to intuitive musical criteria. In addi-tion, music interfaces should permit and encour-age the creative exploration of music repositoriesand new ways of discovering hidden treasures inlarge collections. To make this kind of philoso-phy more popular, our first application is aninteractive exhibit in a modern science museum.
The nepTune interface offers an originalopportunity to playfully explore music by creat-ing an immersive virtual reality founded in thesounds of a user’s digital audio collection. Wewant the interface to be fun to use and engagepeople. The basic ingredients are as follows:Using intelligent audio analysis, nepTune clus-ters the pieces of music according to sound simi-larity. Based on this clustering, the system createsa 3D island landscape containing the pieces (seeFigure 1). Hence, the resulting landscape groupssimilar-sounding pieces together. The more sim-ilar pieces the user owns, the higher the terrainin the corresponding region. The user can movethrough the virtual landscape and explore his orher collection. This visual approach essentiallyfollows the islands of music metaphor (see Figure2).1 Each music collection created has its ownunique characteristics and landscape. In addition
46 1070-986X/07/$25.00 © 2007 IEEE Published by the IEEE Computer Society
Exploring MusicCollections inVirtualLandscapes
Peter Knees, Markus Schedl, Tim Pohle, and Gerhard Widmer
Johannes Kepler University Linz
Advances in Multimedia Computing
to seeing the music pieces in the landscape, thelistener hears the pieces closest to his or her cur-rent position. Thus, the user gets an auditoryimpression of the musical style in the surround-ing region, via a 5.1 surround sound system.
Furthermore, listeners can enrich the land-scape with semantic and visual informationacquired via Web retrieval techniques. Instead ofdisplaying the song title and performing artist onthe landscape, the user can choose to see wordsthat describe or images related to the heardmusic. Thus, besides a purely audio-based struc-turing, nepTune also offers more contextualinformation that might trigger new associationsin the listener/viewer, thus making the experi-ence more interesting and rewarding.
Application realizationHere, we describe the realization of the
nepTune music interface.
Interface concept Our intention is to provide an interface to
music collections that goes beyond convention-al computer interaction metaphors. The first steptoward this is to create an artificial but neverthe-less appealing landscape that encourages the userto explore a music collection interactively.Furthermore, we refrain from using the kind ofstandard user interface components contained inalmost every window toolkit. Rather than con-structing an interface that relies on the classicalpoint-and-click scheme best controlled througha mouse, we made the whole application con-trollable with a standard game pad such as thoseused for video games. From our point of view, agame pad is perfectly suited for exploration of thelandscape as it provides the necessary functional-ity to navigate in 3D while being easy to handle.Furthermore, the resemblance to computer gamesis absolutely intentional. Therefore, we kept thecontrolling scheme simple (see Figure 3).
As mentioned before, another importantinterface characteristic is that it plays the musicsurrounding the listener during navigation.Hence, it’s not necessary to select each song man-ually and scan it for interesting parts. While theuser explores the collection, he or she automati-cally hears audio thumbnails from the closestmusic pieces, giving immediate auditory feed-back on the style of music in the current region.Thus, users directly experience the meaningful-ness of the spatial distribution of music pieces inthe virtual landscape.
Finally, we want to incorporate informationbeyond the pure audio signal. In human percep-tion, music is always tied to personal and cultur-al influences that analyzing the audio can’tcapture. For example, cultural factors comprisetime-dependent phenomena, marketing, or eveninfluences by the user’s peer group. Since we alsointend to account for some of these aspects to
47
Figure 1. An island
landscape created from
a music collection.
Listeners explore the
collection by freely
navigating through the
landscape and hearing
the music typical for
the region around his or
her current position.
Figure 2. Screen shot of the nepTune interface. The large peaky mountain in
the front contains classical music. The classical pieces are clearly separated
from the other musical styles on the landscape. The island in the left
background contains alternative rock, while the islands on the right contain
electronic music.
Zoomin
Zoomout
Move(left/right/forward/backward)
Rotate view(left/right/up/down)
Figure 3. The
controlling scheme of
nepTune. For
navigation, only the
two analog sticks are
necessary. The
directional buttons up
and down arrange the
viewer’s distance to the
landscape. Buttons 1
through 4 switch
between the different
labeling modes.
provide a comprehensive interface to music col-lections, we try to exploit information from theWeb. The Web is the best available source forinformation regarding social factors as it repre-sents current trends like no other medium. Themethod we propose next is but a first simple steptoward capturing such aspects. More specializedWeb-mining methods will be necessary for get-ting at truly cultural and social information.
nepTune provides four modes to explore thelandscape. In the default mode, it displays theartist and track names as given by the MP3 files’ID3 tags (see http://www.id3.org). Alternatively,the system can hide this information, which
focuses the users’ exploration on the spatializedaudio sensation. In the third mode, the land-scape is enriched with words describing the heardmusic. The fourth mode displays images gatheredautomatically from the Web that are related tothe semantic descriptors and the containedartists, which further deepens the multimediaexperience. Figure 4 shows screen shots from allfour modes.
In summary, the nepTune multimedia appli-cation examines several aspects of music andincorporates information at different levels ofmusic perception—from the pure audio signal toculturally determined metadescriptions—which
48
IEEE
Mul
tiM
edia
At the heart of many intelligent digital music applicationsare the notion of musical similarity and computational meth-ods for estimating the similarity of music pieces—as it might beperceived by human listeners. The most interesting, but alsomost challenging, approach to accomplish this is to infer thesimilar information directly from the audio signal via relevantfeature extraction. Music similarity measures have become alarge research topic in music information retrieval; an intro-duction to this topic is available elsewhere.1-3
The second major step in creating a nepTune–like interface isthe automatic structuring of a collection, given pairwise similar-ity relations between the individual tracks. This is essentially anoptimization problem: to place objects into a presentation spaceso that pairwise similarity relations are preserved as much as pos-sible. In the field of music information retrieval, a frequently usedapproach is to apply a self-organizing map (SOM) to arrange amusic collection on a 2D map that the user can intuitively read.4
The most important approach that uses SOMs to structuremusic collections is the islands of music interface.5 The islandsof music approach calculates a SOM on so-called fluctuationpattern features that model the music’s rhythmic aspects.Applying a smoothed data histogram technique visualizes thecalculated SOM. Finally, the system applies a color modelinspired by geographical maps. Thus, on the resulting map,blue regions (oceans) indicate areas onto which few pieces ofmusic are mapped, whereas clusters containing a larger quan-tity of pieces are colored in brown and white (mountains andsnow). Several extensions have been proposed; for example, ahierarchical component to cope with large music collections.6
Similar interfaces use SOM derivatives7 or use SOMs for intu-itive playlist generation on portable devices.8
With our work, we follow Pampalk’s islands of musicapproach and (literally) raise it to the next dimension by pro-viding an interactive 3D interface. Instead of just presenting amap, we generate a virtual landscape that encourages the userto freely navigate and explore the underlying music collection.
We also include spatialized audio playback. Hence, while mov-ing through the landscape, the user hears audio thumbnails ofclose songs. Furthermore, we incorporate procedures from Webretrieval in conjunction with a SOM labeling strategy to displaywords that describe the styles of music or images related tothese styles in the different regions on the landscape.
References1. E. Pampalk, S. Dixon, and G. Widmer, “On the Evaluation of
Perceptual Similarity Measures for Music,” Proc. 6th Int’l Conf.
Digital Audio Effects (DAFx), 2003, pp. 6-12; http://www.
elec.qmul.ac.uk/dafx03/proceedings/pdfs/dafx02.pdf.
2. E. Pampalk, “Computational Models of Music Similarity and
their Application to Music Information Retrieval,” doctoral dis-
sertation, Vienna Univ. of Technology, 2006.
3. T. Pohle, “Extraction of Audio Descriptors and their Evaluation in
Music Classification Tasks,” master’s thesis, TU Kaiserslautern,
German Research Center for Artificial Intelligence (DFKI),
Austrian Research Inst. for Artificial Intelligence (OFAI), 2005;
http://kluedo.ub.uni-kl.de/volltexte/2005/1881/.
4. T. Kohonen, Self-Organizing Maps, vol. 30, 3rd ed., Springer,
Series in Information Sciences, Springer, 2001.
5. E. Pampalk, “Islands of Music: Analysis, Organization, and
Visualization of Music Archives,” master’s thesis, Vienna Univ. of
Technology, 2001.
6. M. Schedl, “An Explorative, Hierarchical User Interface to Structured
Music Repositories,” master’s thesis, Vienna Univ. of Technology, 2003.
7. F. Mörchen et al., “Databionic Visualization of Music Collections
According to Perceptual Distance,” Proc. 6th Int’l Conf. Music
Information Retrieval (ISMIR), 2005, pp. 396-403; http://ismir2005.
ismir.net/proceedings/1051.pdf.
8. R. Neumayer, M. Dittenbach, and A. Rauber, “PlaySOM and
PocketSOMPlayer, Alternative Interfaces to Large Music
Collections,” Proc. 6th Int’l Conf. Music Information Retrieval (ISMIR),
2005, pp. 618-623; http://ismir2005.ismir.net/proceedings/
1096.pdf.
Background
offers the opportunity to discover new aspects ofmusic. This should make nepTune an interestingmedium to explore music collections, unre-strained by stereotyped thinking.
The user’s view We designed the current application to serve
as an exhibit in a public space, that is, in a mod-ern science museum. Visitors are encouraged tobring their own collection—for example, on aportable MP3 player—to explore their collectionin the virtual landscape. Thus, our main focuswas not on the system’s applicability as a prod-uct ready to use at home. However, we couldachieve this with little effort by incorporatingstandard music player functionalities.
In the application’s current state, the userinvokes the exploration process through con-necting his or her portable music player via a USBport. nepTune automatically recognizes this, andthe system then randomly extracts a predefinednumber of audio files from the player and startsto extract audio features (mel frequency cepstralcoefficients, or MFCCs) from these. A specialchallenge for applications presented in a publicspace is to perform computationally expensivetasks, such as audio feature analysis, while keep-ing visitors motivated and convincing them thatthere is actually something happening. Wedecided to visualize the progress of audio analysisvia an animation: small, colored cubes displaythe number of items left to process. For eachtrack, a cube with the number of the track popsup in the sky. When an audio track’s processingis finished, the corresponding cube drops downand splashes into the sea. After the systemprocesses all tracks, an island landscape that con-tains the tracks emerges from the sea. After this,the user can explore the collection.
The system projects a 3D landscape onto thewall in front of the user. While moving throughthe terrain, the listener can hear the closestsounds with respect to his or her position fromthe directions of the music piece locations, toemphasize the immersion. Thus, in addition tothe visual grouping of pieces conveyed by theislands metaphor, users can also perceive islandsin an auditory manner, since they can hear typi-cal sound characteristics for different regions. Toprovide optimal sensation related to these effects,the system outputs sounds via a 5.1 surroundaudio system.
Detaching the USB storage device (that is, theMP3 player) causes all tracks on the landscape to
immediately stop playback. This action also dis-ables the game pad and moves the viewer’s posi-tion back to the start. Subsequently, the landscapesinks back into the sea, giving the next user theopportunity to explore his or her collection.
Technical realization Here, we explain the techniques behind
nepTune: feature extraction from the audio sig-nal, music piece clustering and projection to amap, landscape creation, and landscape enrich-ment with descriptive terms and related images.
Audio feature extraction. Our applicationautomatically detects new storage devices on thecomputer and scans them for MP3 files. nepTunerandomly chooses a maximum of 50 from thecontained files. We have limited the number offiles mainly for time reasons, helping make theapplication accessible to many users. From thechosen audio files, the system extracts and ana-lyzes the middle 30 seconds. These 30 secondsalso serve as looped audio thumbnails in thelandscape. The idea is to extract the audio fea-tures only from a consistent and typical sectionof the track.
For calculating the audio features, we buildupon the method proposed by Mandel and Ellis.2
Like the foregoing approach by Aucouturier,Pachet, and Sandler,3 this approach is based onMFCCs, which model timbral properties. For
49
July–Septem
ber 2007
Figure 4. Screen shots
from the same scene in
the four different
modes: (a) the plain
landscape in mode 1;
(b) mode 2, which
displays artist and song
name; (c) mode 3 shows
typical words that
describe the music,
such as rap, gangsta,
west coast, lyrical, or
mainstream; (d) a
screen shot in mode 4,
where related images
from the Web are
presented on the
landscape. In this case,
these images show rap
artists as well as
related artwork.
(a) (b)
(c) (d)
each audio track, the system computes MFCCson short-time audio segments (frames) to get acoarse description of the envelope of the indi-vidual analysis frames’ frequency spectrum. Thesystem then models the MFCC distribution overall of a track’s frames via a Gaussian distributionwith a full covariance matrix. Each music pieceis thus represented by a distribution. Theapproach then derives similarity between twomusic pieces by calculating a modified Kullback-Leibler distance on the means and covariancematrices. Pairwise comparison of all pieces resultsin a similarity matrix, which is used to clustersimilar pieces.
Landscape generation. To generate a land-scape from the derived similarity information, weuse a self-organizing map.4 The SOM organizesmultivariate data on a usually 2D map in such amanner that similar data items in the high-dimensional space are projected to similar maplocations. Basically, the SOM consists of anordered set of map units, each of which isassigned a model vector in the original dataspace. A SOM’s set of all model vectors is calledits codebook. There exist different strategies to ini-tialize the codebook; we use linear initialization.4
For training, we use the batch SOM algorithm:first, for each data item x, we calculate theEuclidean distance between x and each modelvector.5 The map unit possessing the model vec-tor closest to a data item x is referred to as the bestmatching unit and represents x on the map. In thesecond step, the codebook is updated by calcu-lating weighted centroids of all data elementsassociated with the corresponding model vectors.This reduces the distances between the data itemsand the model vectors of the best matching unitsand their surrounding units, which participate toa certain extent in the adaptations. The adapta-tion strength decreases gradually and depends on
both unit distance and iteration cycle. This sup-ports large cluster formation in the beginning ofand fine tuning toward the end of the training.Usually, the iterative training continues until aconvergence criterion is fulfilled.
To create appealing visualizations of the SOM’sdata clusters, we calculate a smoothed data his-togram.6 An SDH creates a smooth height profile(where height corresponds to the number ofitems in each region) by estimating the data itemdensity over the map. To this end, each data itemvotes for a fixed number n of best matching mapunits. The best matching unit receives n points,the second best n � 1, and so on. Accumulatingthe votes results in a matrix describing the distri-bution over the complete map. After each piece ofmusic has voted, interpolating the resultingmatrix yields a smooth visualization. Additionally,the user can apply a color map to the interpolat-ed matrix to emphasize the resulting height pro-file. We apply a color map similar to the one usedin the islands of music, to give the impression ofan island-like terrain.
Based on the calculated SDH, we create a 3Dlandscape model that contains the musicalpieces. However, the SOM representation onlyassigns the pieces to a cluster rather than to a pre-cise position. Thus, we have to elaborate a strate-gy to place the pieces on the landscape. Thesimplest approach would be to spread them ran-domly in the region of their corresponding mapunit. That has two drawbacks. The first is theoverlap of labels, which occurs particularly oftenfor pieces with long names and results in clut-tered maps. The second drawback is the loss ofordering of the pieces. It’s desirable to have place-ments on the map that reflect the positions infeature space in some way.
The solution we adopted is to define a mini-mum distance d between the pieces and place thepieces on concentric circles around the mapunit’s center so that this distance is always guar-anteed. To preserve at least some of the similari-ty information from feature space, we sort allpieces according to their distance to the modelvector of their best matching unit in featurespace. The first item is placed in the center of themap unit. Then, on the first surrounding circle(which has a radius of d), we can place at most(2� � 6) so that d is maintained (because the circlehas a perimeter of 2d�). The next circle (radius2d) can host up to (4� � 12) pieces, and so on. Formap units with few items, we scale up the circleradii to distribute the pieces as far as possible
50
IEEE
Mul
tiM
edia
We apply a color map
similar to the one used in
the islands of music, to give
the impression of an
island-like terrain.
51
July–Septem
ber 2007
within the unit’s boundaries. As a result, thepieces most similar to the cluster centers stay inthe centers of their map units and distances arepreserved to some extent. More complex (andcomputationally demanding) strategies are con-ceivable, but this simple approach works wellenough for our scenario.
Displaying labels and images. An importantaspect of our user interface is the incorporationof related information extracted automaticallyfrom the Web. The idea is to augment the land-scape with music-specific terms commonly usedto describe the music in the current region. Weexploit the Web’s collective knowledge to figureout which words are typically used in the contextof the represented artists.
To determine descriptive terms, we use amusic description map.7 For each containedartist, we send a query consisting of the artist’sname and the additional constraints “musicstyle” to Google. We retrieve Google’s result pagecontaining links to the first 100 pages. Instead ofdownloading each of the returned sites, wedirectly analyze the complete result page—thatis, the text snippets presented. Thus, we just haveto download one Web page per artist. To avoidthe occurrence of unrelated words, we use adomain-specific vocabulary containing 945terms. Besides some adjectives related to moodsand geographical names, these terms consistmainly of genre names, musical styles, and musi-cal instrument types. For each artist, we counthow often the terms from the vocabulary occuron the corresponding Web page (term frequen-cy), which results in a term frequency vector.
After obtaining a vector for each artist, weneed a strategy for transferring the list of artist-relevant words to specific points on the landscapeand for determining those words that discriminatebetween the music in one region of the map andthe music in other regions—for example, “music”is not a discriminating word, since it occurs fre-quently for all artists. For each unit, we sum up theterm frequency vectors of the artists associatedwith pieces represented by the unit. The result isa frequency vector for each unit.
Using these vectors, we want to find the mostdescriptive terms for the units. We decided toapply the SOM labeling strategy proposed byLagus and Kaski.8 Their heuristically motivatedscheme exploits knowledge of the SOM’s struc-ture to enforce the emergence of areas withcoherent descriptions. To this end, the approach
accumulates term vectors from directly neigh-boring units and ignores term vectors from amore distant neutral zone. We calculate thegoodness score G2 of a term t as a descriptor forunit u as follows:
where if the (Manhattan) distance of unitsu and k on the map is below a threshold r0, and
if the distance of u and i is greater than r0
and smaller than some r1 (in our experiments weset r0 � 1 and r1 � 2). F(t, u) denotes the relativefrequency of term t on unit u and is calculated as
where f(a, u) gives artist a’s number of tracks onunit u and tf(t, a) gives artist a’s term frequencyof term t. Because many neighboring units con-tain similar descriptions, we try to find coherentparts of the music description map and join themto single clusters. The system then displays in thecenter of the cluster the most important terms foreach cluster; it then randomly distributes theremaining labels across the cluster. The displaysize of a term corresponds to its score, G2.
To display images related to the artists and thedescribing words, we use Google’s image searchfunction. For each track, we include an image ofthe corresponding artist. For each term, we sim-ply use the term itself as a query and randomlyselect one of the first 10 displayed images.
Implementation remarksWe exclusively wrote the software in Java. We
implemented most of the application functionali-ty in our Collection of Music Information Retrievaland Visualization Applications (CoMIRVA) frame-work, which is published under the GNU GeneralPublic License and can be downloaded from http://www.cp.jku.at/comirva. For the realization of the3D landscape, we use the Xith3D scene graphlibrary (see http://xith.org), which runs on top ofJava OpenGL. Spatialized surround sound is real-ized via Sound3D and Java bindings for OpenAL(see https://joal.dev.java.net). To access the gamecontroller we use the Joystick Driver for Java (seehttp://sourceforge.net/projects/javajoystick).
F t uf a u tf t a
f a u tf v aa
av
( , )( , ) ( , )
( , ) ( , )=
⋅
⋅
∑∑∑
i Au∈ 1
k Au∈ 0
G t uF t k
F t i
k A
i A
u
u
22
0
1
( , )( , )
( , )=
⎡⎣⎢
⎤⎦⎥∈
∉
∑∑
Currently, the software runs on a Windowsmachine. Because all required libraries are alsoavailable for Linux, we plan to port the softwareto that platform soon.
Qualitative evaluationWe conducted a small user study to gain
insights into nepTune’s usability. We askedeight people to play with the interface and tellus their impressions. In general, responses werepositive. People reported that they enjoyedexploring and listening to a music collection bycruising through a landscape. While many con-sidered the option of displaying related imageson the landscape mainly a nice gimmick, manyrated the option to display related words as avaluable add-on, even if some of the displayedwords were confusing for some users. All usersfound controlling the application with agamepad intuitive.
Skeptical feedback was mainly caused bymusic auralization in areas where different stylescollide. However, in general, people rated aural-ization as positive, especially in regions contain-ing electronic dance music, rap and hip-hop, orclassical music, because it assists in quickly iden-tifying groups of tracks from the same musicalstyle. Two users suggested creating larger land-scapes to allow more focused listening to certaintracks in crowded regions.
Future directionsIn its current state, nepTune focuses on inter-
active exploration rather than on providing fullfunctionality to replace existing music players.However, we can easily extend the application toprovide such useful methods as automaticplaylist generation. For example, we could let theuser determine a start and an end song on themap. Given this information, we can then find apath along the distributed pieces on the map.Furthermore, we can easily visualize such pathsand provide some sort of autopilot mode wherethe movement through the landscape occursautomatically by following the playlist path. Byallowing the user to select specific tracks, wecould also introduce focused listening and pre-sent additional track-specific metadata for thecurrently selected track. As in other music playerapplications, we could display further ID3 tagslike album or track length, as well as lyrics oralbum covers.
Large collections (containing tens of thou-sands of tracks) present the biggest challenges.One option would be to incorporate a level-of-detail extension that uses the music descriptorsextracted from the Web. At the top-most level,that is, the highest elevation, only broad descrip-tors like musical styles would be displayed.Reducing the altitude would switch to the nextlevel of detail, making more distinct descriptorsappear, along with important artists for that spe-cific region. Single tracks could then be found atthe most detailed level. This would emphasize therelatedness of the interface to geographical maps,and the application would act even more as aflight simulator for music landscapes.
Another future application scenario concernsmobile devices. We are developing a version ofthe nepTune interface that Java 2 Mobile Editionenabled devices can execute. While a personalcomputer must perform the audio feature extrac-tion step, it’s possible to perform the remainingsteps—that is, SOM training and landscape cre-ation—on the mobile device. Considering theongoing trend toward mobile music applicationsand the necessity of simple interfaces to musiccollections, the nepTune interface could be a use-ful and fun-to-use approach for accessing musicon portable devices. Figure 5 shows a screen shotof the current prototype.
We can conceive of many alternative ways ofaccessing music on mobile music players. Forinstance, we have recently developed anotherinterface, also based on automatic music similar-ity analysis, that permits the user to quicklylocate a particular music style by simply turning
52
IEEE
Mul
tiM
edia
Figure 5. A mobile
device running the
prototype of the
nepTune mobile
version. (We
postprocessed the
display for better
visibility.)
a wheel, much like searching for radio stationson a radio.9 Figure 6 shows our current prototype,implemented on an Apple iPod, in which theclick wheel helps navigate linearly through theentire music collection, which the computer hasarranged according to musical similarity. A num-ber of other research laboratories are also work-ing on novel interfaces.10-12
In general, we believe that intelligent musicapplications (of which nepTune just gives a tinyglimpse) will change the way people deal withmusic in the next few years. Computers thatlearn to understand music in some sense willbecome intelligent, reactive musical compan-ions. They will help users discover new music;provide informative metainformation aboutmusical pieces, artists, styles, and relationsbetween these; and generally connect music toother modes of information and entertainment(text, images, video, games, and so on). Giventhe sheer size of the commercial music market,music will be a driving force in this kind of mul-timedia research. The strong trend toward Web2.0 that floods the Web with texts, images,videos, and audio files poses enormous techni-cal challenges to multimedia, but also offersexciting new perspectives. There is no doubt thatintelligent music processing will become one ofthe central functions in many future multime-dia systems. MM
AcknowledgmentsThe Austrian Science Fund under FWF project
number L112–N04 and the Vienna Science andTechnology Fund under WWTF project numberCI010 (Interfaces to Music) support this research.We thank the students who implemented vitalparts of the project, especially Richard Vogl, whodesigned the first interface prototype; KlausSeyerlehner, who implemented high-level featureextractors; Manfred Waldl, who created the pro-totype of the mobile nepTune version; andDominik Schnitzer, who realized the intelligentiPod interface.
References1. E. Pampalk, “Islands of Music: Analysis, Organization,
and Visualization of Music Archives,” master’s the-
sis, Vienna Univ. of Technology, 2001.
2. M. Mandel and D. Ellis, “Song-Level Features and
Support Vector Machines for Music Classification,”
Proc. 6th Int’l Conf. Music Information Retrieval
(ISMIR), 2005, pp. 594-599; http://ismir2005.ismir.
net/proceedings/1106.pdf.
3. J.J. Aucouturier, F. Pachet, and M. Sandler, “The
Way It Sounds: Timbre Models for Analysis and
Retrieval of Music Signals,” IEEE Trans. Multimedia,
vol. 7, no. 6, 2005, pp. 1028-1035.
4. T. Kohonen, Self-Organizing Maps, vol. 30, 3rd ed.,
Springer Series in Information Sciences, Springer,
2001.
5. W.P. Tai, “A Batch Training Network for Self-
Organization,” Proc. 5th Int’l Conf. Artificial Neural
Networks (ICANN), vol. II, F. Fogelman-Soulié and
P. Gallinardi, eds., EC2, 1995, pp. 33-37.
6. E. Pampalk, A. Rauber, and D. Merkl, “Using
Smoothed Data Histograms for Cluster Visualization
in Self-Organizing Maps,” Proc. Int’l Conf. Artifical
Neural Networks (ICANN), Springer LNCS, 2002,
pp. 871-876.
7. P. Knees et al., “Automatically Describing Music
on a Map,” Proc. 1st Workshop Learning the
Semantics of Audio Signals (LSAS), 2006, pp. 33-42;
http://irgroup.cs.uni-magdeburg.de/lsas2006/
proceedings/LSAS06_Full.pdf
8. K. Lagus and S. Kaski, “Keyword Selection Method
for Characterizing Text Document Maps,” Proc. 9th
Int’l Conf. Artificial Neural Networks, vol. 1, IEEE
Press, 1999, pp. 371-376.
9. T. Pohle et al., “’Reinventing the Wheel’: A Novel
Approach to Music Player Interfaces,” IEEE Trans.
Multimedia, vol. 9, no. 3, 2007, pp. 567-575.
10. M. Goto and T. Goto, “Musicream: New Music
Playback Interface for Streaming, Sticking, and
Recalling Musical Pieces.” Proc. 6th Int’l Conf. Music
53
July–Septem
ber 2007
Figure 6. Prototype of
our intelligent music
wheel interface,
implemented on an
iPod.
Information Retrieval (ISMIR), 2005, pp. 404-411;
http://ismir2005.ismir.net/proceedings/1058.pdf.
11. E. Pampalk and M. Goto, “MusicRainbow: A New
User Interface to Discover Artists Using Audio-Based
Similarity and Web-Based Labeling,” Proc. 7th Int’l
Conf. Music Information Retrieval (ISMIR), 2006, pp.
367-370; http://ismir2006.ismir.net/PAPERS/
ISMIR0668_Paper.pdf
12. R. van Gulik, F. Vignoli, and H. van de Wetering,
“Mapping Music in the Palm of your Hand: Explore
and Discover your Collection, Proc. 5th Int’l Conf.
Music Information Retrieval (ISMIR), 2004, pp. 409-
414; http://ismir2004.ismir.net/proceedings/
p074-page-409-paper153.pdf
Peter Knees is a project assistant
in the Department of Computa-
tional Perception, Johannes Kepler
University Linz, Austria. His
research interests include music
information retrieval, Web min-
ing, and information retrieval.
Knees has a Dipl. Ing. (MS) in computer science from the
Vienna University of Technology and is currently work-
ing on a PhD in music information retrieval from
Johannes Kepler University Linz.
Markus Schedl is working on his
doctoral thesis in computer science
at the Department of Computa-
tional Perception, Johannes Kepler
University Linz. His research inter-
ests include Web mining, (music)
information retrieval, information
visualization, and intelligent user interfaces. Schedl grad-
uated in computer science from the Vienna University
of Technology.
Tim Pohle is pursuing a PhD at
Johannes Kepler University Linz,
where he also works as a research
assistant in music information
retrieval with a special emphasis
on audio-based techniques. His
research interests include musicol-
ogy and computer science. Pohle has a Dipl. Inf. degree
from the Technical University Kaiserslautern, Germany.
Gerhard Widmer is a professor
and head of the Department of
Computational Perception at the
Johannes Kepler University Linz,
and head of the Intelligent Music
Processing and Machine Learning
Group at the Austrian Research
Institute for Artificial Intelligence, Vienna. His research
interests include machine learning, pattern recogni-
tion, and intelligent music processing. Widmer has MS
degrees from the University of Technology, Vienna,
and the University of Wisconsin, Madison, and a PhD
in computer science from the University of
Technology, Vienna. In 1998, he was awarded one of
Austria’s highest research prizes, the Start Prize, for his
work on AI and music.
Readers may contact Peter Knees at the Dept. of
Computational Perception, Johannes Kepler University
Linz, Altenberger Str. 69, 4040 Linz, Austria;
For further information on this or any other computing
topic, please visit our Digital Library at http://computer.
org/publications/dlib.
54
Nowavailable!
FREE Visionary Web Videosabout the Future of Multimedia.
Listen to premiere multimedia experts!Post your own views and demos!
Visit www.computer.org/multimedia