tim pohle, peter knees, markus schedl, elias pampalk, and gerhard widmer ieee transactions on...
TRANSCRIPT
![Page 1: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/1.jpg)
Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer
IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007
Present by Yi-Tang Wang
![Page 2: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/2.jpg)
Outline
Introduction Audio-Based Similarity Web-Based Similarity Problem Modeling Evaluation and Results Conclusion & future work
![Page 3: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/3.jpg)
Introduction
A novel music player interface using a wheel
Generating a circular playlist from personal repositories
Keeps on playing similar tracks Not only audio-based similarity is
used, but also text-based similarity
![Page 4: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/4.jpg)
Audio-Based Similarity
MFCCs ( Mel frequency cepstral coefficients )
Discarding the higher-order MFCCs beneficial for the ability to compare
different frames, but possibly at the cost of discarding musically meaningful information.
![Page 5: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/5.jpg)
Audio-Based Similarity
The wave file were downsampled to 22 kHz
19 MFCCs per frame Ignoring the temporal order Model the distribution of MFCC
coefficients with Gaussian mixture model
![Page 6: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/6.jpg)
Audio-Based Similarity
Similarity between music Compute the distance between two
GMM Likelihood
computing the probability that the MFCCs of song A be generated by the model of B
Drawback: need to store all MFCC coefficients
![Page 7: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/7.jpg)
Audio-Based Similarity
Sampling Only store the GMM parameters,
instead of storing MFCCs Sample from one GMM
compute the likelihood given another GMM
Corresponds roughly to re-creating a song
![Page 8: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/8.jpg)
Web-Based Similarity
Cultural, social, historical, and contextual aspects should be taken into account
WWW information Query using artist’s name + ”music”
with Google 50 top-ranked pages are retrieved Remove all terms that - # of occur page
< c Such that about 10000 terms remain
![Page 9: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/9.jpg)
Web-Based Similarity
Term frequency tfta
a : artist , t : term # of occurrences of t in documents
related to a Document Frequency dft
# of pages t occurred in Term weight per artist
term frequency × inverse document frequency
![Page 10: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/10.jpg)
Web-Based Similarity
Each artist is described by a vector of term weights
Apply cosine normalization on the vector
Euclidean distance is a simple similarity measure
In this paper, we use SOM as measure method
![Page 11: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/11.jpg)
Web-Based Similarity - SOM SOM - Self-organizing Maps
a subtype of artificial neural networks It is trained using unsupervised learning low dimensional representation of the
training samples while preserving the topological properties of the input space
Using a rectangular 2-D grid in this paper for text-based similarity between songs
![Page 12: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/12.jpg)
Web-Based Similarity - SOM A SOM consists of units A model vector in the high-
dimensional input data space is assigned to each of the units.
model vectors which belong to units close to each other on the 2-D grid, are also close to each other in the data space.
Training to choose model vectors
Unit
![Page 13: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/13.jpg)
Web-Based Similarity - SOM Batch-SOM algorithm Initial
Randomly initialize the model vector 1st step
for each data item xi, the Euclidean distance between x and each model vector is calculated
each data item x is assigned to the unit ci that represents it best.
![Page 14: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/14.jpg)
Web-Based Similarity - SOM 2nd step
neighborhood relationship between two units is usually defined by a Gaussian-like function
hjk = exp(-djk2/rt 2)
djk= distance on the map , rt= neighborhood radius rt decrease with each iteration (the
adaptation strength decreases gradually)
![Page 15: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/15.jpg)
Web-Based Similarity - SOM Two artist is similar if they are
mapped to same or adjacent units
Newer experiments have actually shown that 6 × 6 grid might be better for this collection
![Page 16: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/16.jpg)
Combining two approach
Adding a constant value to the audio-based distance matrix for all songs of dissimilar artists Half of maximum audio-based distance
Adding Penalty to transitions between songs by dissimilar artist
![Page 17: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/17.jpg)
Previous work
Audio-based similarity – Fluctuation Patterns
Using SOM only on audio-based data Labeling SOM with information from
www A 3-D browsing system
P. Knees, M. Schedl, T. Pohle and G.Widmer, “An Innovative Three Dimensional User Interface for Exploring Music Collections Enriched with Meta-Information from the Web,” ACM MM’06
![Page 18: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/18.jpg)
Problem Modeling
Map the playlist generation problem to Traveling Salesman Problem
The cities correspond to the tracks in collection
The distances are determined by the similarities between the tracks
Find a optimal route = producing a circular playlist
![Page 19: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/19.jpg)
TSP Problem Greedy Algorithm
All edges are examined in order of increasing length and add to the route properly
Minimum Spanning Tree Found a minimum spanning tree and do DFS Connecting the nodes in the order they are
first visited LKH
Lin-Kernighan algorithm proposed in 1971 Start with randomly generated tour Deleting edges from the route and
recombining the remaining tour fragments
![Page 20: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/20.jpg)
TSP Problem
One-Dimensional SOM Train a 1-D cyclic SOM
a circular playlist As many units as tracks? Recursive approach Combining subtour in a greedy manner
![Page 21: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/21.jpg)
![Page 22: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/22.jpg)
Evaluation & Results
Collection 1 2545 tracks, 13 genres A Cappella (4.4%), Acid Jazz (2.7%), Blues
(2.5%), Bossa Nova (2.8%), Celtic (5.2%), Electronica (21.1%), Folk Rock (9.4%), Italian (5.6%), Jazz (5.3%), Metal (16.1%), Punk Rock (10.2%), Rap (12.9%), and Reggae (1.8%)
103 artists for each artist, minimum - 8 tracks,
maximum - 61 tracks
![Page 23: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/23.jpg)
Evaluation & Results
Collection 2 3456 tracks, 7 genres Classical (14.7%), Dance (15.0%), Hip-
Hop (14.5%), Jazz (13.6%), Metal (14.9%), Pop (11.6%), and Punk (15.6%). The minimum number
339 artists for each artist, minimum - 1 tracks,
maximum - 317 tracks
![Page 24: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/24.jpg)
Fluctuations Between Genres A Cappella, Acid Jazz, Blues, Bossa Nova, Celtic,
Electronica, Folk Rock, Italian, Jazz, Metal, Punk Rock, Rap, andReggae (collection 1)
![Page 25: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/25.jpg)
Shannon Entropy
Estimate how locally coherent a playlist is
Count how many of n consecutive tracks belonged to each genre
n = 2…12 Typical album contains about 12 tracks
Average over the whole playlist SOM yields better results on web-
enhanced data than LKH on audio only data
![Page 26: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/26.jpg)
Shannon Entropy
![Page 27: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/27.jpg)
Long-Term Consistency
SOM algorithm on combined data
![Page 28: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/28.jpg)
Long-Term Consistency
MinSpan algorithm on audio similarity data
![Page 29: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/29.jpg)
Long-Term Consistency
Greedy algorithm on audio similarity data
![Page 30: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/30.jpg)
Long-Term Consistency
![Page 31: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/31.jpg)
User Study
10 test persons using the collection 2 Create a large playlist Extract 10 seed tracks
Randomly choosing a start point Selecting tracks at intervals of 3 degress
Generate two playlist Adding the next nine tracks Randomly choose from same genre
![Page 32: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/32.jpg)
User Study
Users rate each playlist from 1 to 5 Summing up rating scores Calculate the difference tspi,j - geni,j
i : playlist no. , j : user
![Page 33: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/33.jpg)
User Interface
![Page 34: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/34.jpg)
User Interface
The user interface is very intuitive and its handling extremely easy
Apple’s iPod Users’ opinion
A scanning function to skip 10 seconds when pressing
Genres containing only a few tracks are quite difficult to locate
Not usable when finding a specific track
![Page 35: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/35.jpg)
Summary of Evaluation Result all TSP algorithms provided better
results with respect to our playlist evaluation criteria when using the web based extension
the combined similarity measure reduces the number of unexpected placements of tracks in the playlist
![Page 36: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/36.jpg)
Summary of Evaluation Result LKH and greedy algorithm
best small-scale genre entropy values large-scale genre distributions are quite
fragmented SOM-based algorithm
highest entropy values the least fragmented long-term genre
distributions MinSpan algorithm
in the middle field regarding the entropy values
![Page 37: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/37.jpg)
Conclusion & future work
a new approach to conveniently access the music stored in mobile sound players
The whole collection is ordered in a circular playlist and thus accessible with only one input wheel
two different similarity measures — one relying on timbre information, the other on a combination of timbre and community metadata gathered from artist related web pages
![Page 38: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/38.jpg)
Conclusion & future work
Problems to solve Not possible to precisely select a desired
piece only tracks selectable that are
representative for a region zooming or hierarchical structuring
techniques The user does not know in advance
which region on the wheel contains which style of music
![Page 39: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/39.jpg)
Conclusion & future work
M. Schedl, T. Pohle, P. Knees, and G.Widmer, “Assigning and visualizing music genres by web-based co-occurrence analysis,” in Proc. 7th Int. Conf. Music Information Retrieval (ISMIR’06), Victoria, Canada, Oct. 2006.
![Page 40: Tim Pohle, Peter Knees, Markus Schedl, Elias Pampalk, and Gerhard Widmer IEEE Transactions on Multimedia, Vol 9, No. 3, April 2007 Present by Yi-Tang Wang](https://reader035.vdocuments.mx/reader035/viewer/2022062423/56649e105503460f94afbc8d/html5/thumbnails/40.jpg)
Thank You