pitch analysis for active music discovery - justin salamon...1 pitch analysis for active music...

1

Pitch Analysis for Active Music Discovery

1Music and Audio Research Laboratory2Center for Urban Science and Progress

New York University, USA

Thursday June 23rd 2016

Justin Salamon1,2

@justin_salamon [email protected] www.justinsalamon.com

Collaborators: Juan Pablo Bello, Rachel Bittner, Jordi Bonada, Juan J. Bosch, Jose Miguel Diaz-Bañez, Chris Cannam, Francisco Escobar, Slim Essid, Emilia Gómez, Paco Gómez, Sankalp Gulati, Helga Jiang, Edith Law, Matthias Mauch, Joaquin Mora, Sergio Oramas, Geoffroy Peeters, Aggelos Pikrakis, Axel Röbel, Bruno Rocha, Joan Serrà, Xavier Serra, Tim Tse, Alex Williams

In A Nutshell 2/35

In A Nutshell

‣  GOAL: active music discovery

3/35

In A Nutshell


‣  MEANS: pitch content analysis

4/35

In A Nutshell


‣  MEANS: pitch content analysis

‣  CHALLENGE: data scarcity (and solution!)

5/35

Active Music Discovery

Music Recommendation 7/35

Music Recommendation 8/35

Active Search & Discovery

9/35


10/35Song A

Song B

Versions?

Yes! Similarity = 0.92

COVER SONG ID

(Salamon, Serrà & Gómez, 2013)


11/35Song A

Song B

Versions?


COVER SONG ID


Query-by-humming



12/35Song A

Song B

Versions?


COVER SONG ID


Query-by-humming


melodic pattern discovery

(Pikrakis et al., 2012)


13/35Song A

Song B

Versions?


COVER SONG ID


Query-by-humming


100 150 200 250 300 350 400

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Frequency bins (1 bin = 10 cents), Ref: 55Hz

Norm

aliz

ed s

alie

nce

Multipitch Histogram

ρ2

ρ4 ρ3

ρ5

Tonic ID

(Salamon, Gulati & Serra, 2012)




14/35Song A

Song B

Versions?


COVER SONG ID


Query-by-humming


5.6 5.8 6 6.2 6.4 6.60

0.1

0.2

0.3

0.4

0.5

Mean vibrato rate (vr:mean*)

Me

an

vib

rato

co

ve

rag

e (

v c:me

an

*)

Flamenco

Inst. jazz

Opera

Pop

Vocal jazz

SINGING STYLE classification

(Salamon, Rocha & Gómez, 2012)

100 150 200 250 300 350 400

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1

Frequency bins (1 bin = 10 cents), Ref: 55Hz

Norm

aliz

ed s

alie

nce

Multipitch Histogram

ρ2

ρ4 ρ3

ρ5

Tonic ID

(Salamon, Gulati & Serra, 2012)



Machine Learning for Pitch Analysis

16/35Melody Extraction

19/35Melodia (Salamon & Gómez, 2012)


Contour Extraction Melody f0Contour Selection


Melody f0Contour Extraction Contour Selection


Melody f0

Audio signal

Sinusoid extraction

Salience function

Pitch contour creation

Melody selection

Iterative

Melody f0 sequence

Spectral peaks

Time-frequency salience

Pitch contours

Bin salience mapping with harmonic weighting

Contour characterisation

Peak streaming Peak filtering

Frequency/amplitude correction

Spectral transform

Equal loudness filter

Melody peak selection

Pitch outlier removal

Melody pitch mean

Octave error removal

Voicing detection

Melody pitch mean

Audio signal

Sinusoid extraction

Salience function


Melody selection

Iterative

Melody f0 sequence

Spectral peaks


Pitch contours





Spectral transform




Melody pitch mean


Voicing detection

Melody pitch mean

Contour Extraction Contour Selection


Melody f0

Audio signal

Sinusoid extraction

Salience function


Melody selection

Iterative

Melody f0 sequence

Spectral peaks


Pitch contours





Spectral transform




Melody pitch mean


Voicing detection

Melody pitch mean

Audio signal

Sinusoid extraction

Salience function


Melody selection

Iterative

Melody f0 sequence

Spectral peaks


Pitch contours





Spectral transform




Melody pitch mean


Voicing detection

Melody pitch mean



Melody f0

Audio signal

Sinusoid extraction

Salience function


Melody selection

Iterative

Melody f0 sequence

Spectral peaks


Pitch contours





Spectral transform




Melody pitch mean


Voicing detection

Melody pitch mean

Audio signal

Sinusoid extraction

Salience function


Melody selection

Iterative

Melody f0 sequence

Spectral peaks


Pitch contours





Spectral transform




Melody pitch mean


Voicing detection

Melody pitch mean


?

Melodia (Salamon & Gómez, 2012)


vibrato

pitch deviation

length

pitch mean

salience

rate extent

1000 2000 3000 4000 5000 60000

0.1

0.2

(a)Pitch mean distribution (cents)

0 100 200 300 400 5000

0.4

0.8

(b)Pitch standard deviation distribution (cents)

0 1 2 30

0.075

0.15

(c)Contour mean salience distribution

0 1 2 3 4 50

0.15

0.3

(d)Contour salience standard deviation distribution

0 2 4 6 8 100

0.4

0.8

(e)Contour total salience distribution

0 2 4 60

0.4

0.8

(f)Contour length distribution (seconds)


31/35

Melody f0

Audio signal

Sinusoid extraction

Salience function


Melody selection

Iterative

Melody f0 sequence

Spectral peaks


Pitch contours





Spectral transform




Melody pitch mean


Voicing detection

Melody pitch mean

Audio signal

Sinusoid extraction

Salience function


Melody selection

Iterative

Melody f0 sequence

Spectral peaks


Pitch contours





Spectral transform




Melody pitch mean


Voicing detection

Melody pitch mean



32/35

Melody f0

Audio signal

Sinusoid extraction

Salience function


Melody selection

Iterative

Melody f0 sequence

Spectral peaks


Pitch contours





Spectral transform




Melody pitch mean


Voicing detection

Melody pitch mean



33/35

Melody f0

Audio signal

Sinusoid extraction

Salience function


Melody selection

Iterative

Melody f0 sequence

Spectral peaks


Pitch contours





Spectral transform




Melody pitch mean


Voicing detection

Melody pitch mean



MIREX: 75%MedleyDB: 57%

34/35Contour Classification (Salamon, Peeters & Röbel, 2012)

Melody f0

Audio signal

Sinusoid extraction

Salience function


Melody selection

Iterative

Melody f0 sequence

Spectral peaks


Pitch contours





Spectral transform




Melody pitch mean


Voicing detection

Melody pitch mean

Melody contour features

Accompaniment contour features


35/35Contour Classification (Bittner et al., 2015)

Melody f0

Audio signal

Sinusoid extraction

Salience function


Melody selection

Iterative

Melody f0 sequence

Spectral peaks


Pitch contours





Spectral transform




Melody pitch mean


Voicing detection

Melody pitch mean


36/35Contour Classification (Bittner et al., 2015)

Melody f0

Audio signal

Sinusoid extraction

Salience function


Melody selection

Iterative

Melody f0 sequence

Spectral peaks


Pitch contours





Spectral transform




Melody pitch mean


Voicing detection

Melody pitch mean


37/35Contour Classification (Bosch et al., 2016)


38/35Contour Classification (???, 2017)


?

39/35Contour Classification (???, 2017)

Melody f0Deep Melody Net

?

Data Scarcity

Continuous Melody f0 Annotation


MonophonicPitch Tracker



Tony (Mauch et al., 2015)

Datasets for Melody Extraction Eval


‣  MIREX‣  ADC2004: 6 minutes (20 excerpts)‣  MIREX05: 12 minutes (25 excerpts)



‣  MIR1K: 2.2 hr (1000 excerpts), C-Pop



‣  MIR1K: 2.2 hr (1000 excerpts), C-Pop‣  RWC-Pop: 6.8 hr (100 songs), J-Pop



‣  MIR1K: 2.2 hr (1000 excerpts), C-Pop‣  RWC-Pop: 6.8 hr (100 songs), J-Pop‣  MedleyDB (Bittner et al., 2014):

‣  7.5 hr (108 songs), varied




‣  7.5 hr (108 songs), varied‣  MedleyDBv2 (coming soon):

‣  ~17hr (~240 songs), varied






~50 annotator-hours108 songs






~50 annotator-hours108 songs 20 million songs






~50 annotator-hours108 songs

~1057 annotator-years20 million songs

Data Scarcity

Data ScarcityOvercoming

Crowdsourcing Melody Note Annotations

Crowdsourcing Melody Note Annotations

Ensemble (Tse et al., 2016)

Data Augmentation: f0 Annotation-by-Synthesis


Cleaning +Smoothing



Cleaning +Smoothing


SinusoidalModelling


Cleaning +Smoothing


SinusoidalModelling

Synthesis


Cleaning +Smoothing


SinusoidalModelling

SynthesisMixing

Algorithm 1

Algorithm 2

Algorithm 3

Metrics: A B C D E

Original data + manual annotations

Synthesized data + automatic annotations


Summary 70/35

Summary

‣  Active music discovery: user plays active role in retrieval

71/35

Summary

‣  Active music discovery: user plays active role in retrieval‣  Examples: QBH, search-by-singing-style, computational

(ethno)musicology…

72/35

Summary


(ethno)musicology…‣  Require (auto) extraction of pitch content: melody extraction

73/35

Summary



‣  Melody extraction: classification at contour level

74/35

Summary



‣  Melody extraction: classification at contour level‣  Data scarcity: can’t explore high-capacity (and data

hungry) models

75/35

Summary




hungry) models‣  Solutions:

76/35

Summary





‣  Crowdsourcing

77/35

Summary





‣  Crowdsourcing ‣  Data augmentation: annotation-by-synthesis

78/35

Summary






‣  Thanks!

79/35

Summary






‣  Thanks!

80/35

@justin_salamon [email protected] www.justinsalamon.com

pitch analysis for active music discovery - justin salamon...1 pitch analysis for active music...

Documents