picasso – to sing you must close your eyes and...
TRANSCRIPT
PICASSO – To Sing you must Close Your Eyes and Draw
Seminar Informatik in den Medien
Eloy Rodríguez Rey
3
PICASSO● What?
PIcture CAtegorization for Suggesting SOundtracks● Why?
Problem of proposing a soundtrack to a picture or a group of pictures● How?
– Training data (40,000 image/soundtrack samples from 28 movies)
– Three-level algorithm
4
Previous Works
● Start with a soundtrack and then find the appropiate images
● Focus on impressionism paintings – emotions● Suggest music to a driving scenery● Align the video transitions with the transitions
in a given music piece
5
Technical Background
● Low-level features for both image-to-image and song-to-song similarity measures
– Image-to-image similarity● MPEG-7 color● Texture low level features
– Song-to-song similarity● Spectral shape● Temporal low level features
7
Training Database
Figure 1: PICASSO – To Sing you must Close your Eyes and Draw. Stupar, Aleksandar; Sebastian, Michel
8
Music/Speech Classification
● Naïve Bayes classifier– 64 speech samples
– 64 music samples
● Low-level features– Training classifier
– Classification task
● Marsyas tool– Features extraction
– Classification
9
Music/speech classification
● Output of the classifier for each second of the soundtrack:
– Label: “music” or “speech”
– Confidence value
● Only musical parts of the soundtrack– Musical parts with confidence value >95% and
with length longer than 5 seconds
10
Scene Detection● Splitting the sequence of screenshots on positions where the
image-to-image distance is larger than a given threshold
● The sequence of the screenshots from one split to the second one is considered a scene
● Considerations:
– Eliminated short musical parts
– Discard scenes whose length < 5 seconds
– > 10 seconds are split in multiple parts
Figure 2: PICASSO – To Sing you must Close your Eyes and Draw. Stupar, Aleksandar; Sebastian, Michel
11
Image Similarity Measure
● The following MPEG-7 features vectores are used:
– Scalable color
– Color structure
– Color layout
– Edge histogram
12
Image Similarity Measure
● Color structure: describes all the colors found in the image by aggregating them in a color histogram
Figure 3: https://documentation.apple.com/en/color/usermanual/Art/S02/S0208_RGBHistogram.png
13
Image Similarity Measure
● All distance calculations are combined in one distance measurement:
1) Calculating the standard score (z-score) for each of the descriptors
2) Summing up all of the standard scores into a single score
14
Music Similarity Measure
● The following low level musical descriptors are used:
– MFCC
– Chroma
– Spectral centroid
– Spectral rolloff
– Spectral flux
– Time domain zero crossing
15
Music Similarity Measure
● Spectral centroid: center of gravity of a musical signal's spectral representation
Figure 4: http://w3.impa.br/~cicconet/audiofeature/Feature_SpectralCentroid.png
16
Music Similarity Measure [3]
● To calculate the similarity between two songs:
1) Feature vectors of each descriptors are extracted
2) Pairwise similarity between these vectors is calculated and combined
● Pairwise is not enough, music also has a time dimension
– Dynamic Time Warping (DTW)
17
Music Similarity Measure
● Dynamic Time Warping (DTW): enables sequence matching with the variations in speed
Figure 5: http://america.pink/images/1/3/3/9/5/2/7/en/3-dynamic-time-warping.jpg
18
Music Similarity Measure
● The sum of distances between the soundtrack sample and these three positions in the song is used as the resulting distance
Figure 6: PICASSO – To Sing you must Close your Eyes and Draw. Stupar, Aleksandar; Sebastian, Michel
21
Single Image Recommendation
● Two phases of K-nearest neighbor searches: first, in the image domain; and second, between musical pieces
– Phase 1. When the query is submitted, its distance to each of the images in training dataset is calculated
– Phase 2. After the top-K images are found, the list of the songs together with their score for each image is retrieved
22
Multiple Images
● Group these images using a clustering algorithm
● Recommend a soundtrack for each of the groups
– Average position
– Least misery
24
Evaluation
● Single image
1) Grade the 1st ranked recommended sountrack
2) Grade the 10th
3) Grade a random
● Multiple images
Evaluation of the average position
1) Least misery approaches
2) Random recommendation
● Dataset is obtained by downloading songs from the music2ten site
25
Evaluation
Figure 7: PICASSO – To Sing you must Close your Eyes and Draw. Stupar, Aleksandar; Sebastian, Michel
Figure 8: PICASSO – To Sing you must Close your Eyes and Draw. Stupar, Aleksandar; Sebastian, Michel
26
Evaluation
● The runtime of the query processing was measured too
Figure 9: PICASSO – To Sing you must Close your Eyes and Draw. Stupar, Aleksandar; Sebastian, Michel
28
Conclussion
● Automated approach to recommend a soundtrack for a picture or a series of pictures
● Extractions of knowledge from popular, publicly available, common movies and how this information can be used
● PICASSO is based on the usage of low-level features for similarity comparison between images and between songs
29
Media vs. Paper
Media Paper
Level of details
Lower Higher
Oriented All public People with knowledge
Level of language
Lower Higher
Size Shorter Longer
Citations Rarely Always