a small footprint for audio and music classification
TRANSCRIPT
![Page 1: A small footprint for audio and music classification](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58a5a2f21a28ab1a628b4605/html5/thumbnails/1.jpg)
A small footprint foraudio and music classification
Hamid Eghbal-zadeh
1
![Page 2: A small footprint for audio and music classification](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58a5a2f21a28ab1a628b4605/html5/thumbnails/2.jpg)
Outline
1. Introduction
2. I-Vector representation
3. Some results
4. Conclusion
2
![Page 3: A small footprint for audio and music classification](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58a5a2f21a28ab1a628b4605/html5/thumbnails/3.jpg)
INTRODUCTION
3
![Page 4: A small footprint for audio and music classification](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58a5a2f21a28ab1a628b4605/html5/thumbnails/4.jpg)
A small footprint for Audio and Music classification
4
𝑎1𝑎2
𝑎𝑛
.
.
.
Audio Acoustic features Front-end Small footprint Classifier
o Front-end:• Block-level features (Genre classification) [Seyerlehner,2010]• Adapted GMM means (Genre classification) [Charbuillet,2011]• Adapted RBM weights (Speaker verification) [Ghahabi,2014]• Factor Analysis (Artist recognition) [Eghbal-zadeh, 2015]
Signal processing
Machine learning
Machine learning
Machine learning
![Page 5: A small footprint for audio and music classification](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58a5a2f21a28ab1a628b4605/html5/thumbnails/5.jpg)
5
• Block-level features (Genre classification) [Seyerlehner,2010]• Adapted GMM means (Genre classification) [Charbuillet,2011]• Adapted RBM weights (Speaker verification) [Ghahabi,2014]• Factor Analysis (Artist recognition) [Eghbal-zadeh, 2015]
Signal processing
Machine learning
Machine learning
Dev:
Train:
Test:
Machine learning
Dev db Universal Background Model(UBM)
Train db
+
UBM Adaptation Adapted UBM params
Classifier
Train
Test db
+
UBM Adaptation Adapted UBM params
Classifier
Test
Train
![Page 6: A small footprint for audio and music classification](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58a5a2f21a28ab1a628b4605/html5/thumbnails/6.jpg)
6
• Block-level features (Genre classification) [Seyerlehner,2010]• Adapted GMM means (Genre classification) [Charbuillet,2011]• Adapted RBM weights (Speaker verification) [Ghahabi,2014]• Factor Analysis (Artist recognition) [Eghbal-zadeh, 2015]
Signal processing
Machine learning
Machine learning
Dev:
Train:
Test:
Machine learning
Dev db Universal Background Model(UBM)
Train db
+
UBM AdaptationAdapted UBM
paramsClassifier
Train
Test db
+
UBM Adaptation Adapted UBM params
Classifier
Test
Train
Train
Test
Factor analysis
Factor analysis
![Page 7: A small footprint for audio and music classification](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58a5a2f21a28ab1a628b4605/html5/thumbnails/7.jpg)
Effect of Factor Analysis step
7
An example of songs in GTZAN dataset from 3 genres [Eghbal-zadeh, ISMIR2015]:Right: without Factor AnalysisLeft: With Factor Analysis
Artist recognition performance on Artist20 with and Without Factor Analysis [Eghbal-zadeh, Eusipco2015]
Without FA
With FA
![Page 8: A small footprint for audio and music classification](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58a5a2f21a28ab1a628b4605/html5/thumbnails/8.jpg)
8
Other benefits:
• Noise-Robust features [Eghbal-zadeh,ISMIR2016]
• Combined with Neural Nets [Eghbal-zadeh, DAFx2016]
• Successfully used in different tasks:• Speaker verification• Language recognition• Artist recognition• Music similarity• Audio scene classification
![Page 9: A small footprint for audio and music classification](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58a5a2f21a28ab1a628b4605/html5/thumbnails/9.jpg)
Why to apply Factor Analysis?
• They provide an information-rich, fixed-length, low-dimensional representation
• They have a single-Gaussian distribution• We can use the properties of Gaussians
• They can be easily scored• Using cosine distance
• They are the estimated latent factors with a good discrimination power resulted from a Factor Analysis procedure
9
![Page 10: A small footprint for audio and music classification](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58a5a2f21a28ab1a628b4605/html5/thumbnails/10.jpg)
I-VECTOR
REPRESENTATION AS
A SMALLFOOTPRINT
10
![Page 11: A small footprint for audio and music classification](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58a5a2f21a28ab1a628b4605/html5/thumbnails/11.jpg)
11
• Block-level features (Genre classification) [Seyerlehner,2010]• Adapted GMM means (Genre classification) [Charbuillet,2011]• Adapted RBM weights (Speaker verification) [Ghahabi,2014]• Factor Analysis (Artist recognition) [Eghbal-zadeh, 2015]
Signal processing
Machine learning
Machine learning
Dev:
Train:
Test:
Machine learning
Dev db UBM (GMM)
Train db
+
UBMAdapted GMM params
(statistical representation)Classifier
Train
Test db
+
UBM Classifier
Test
Train
Train
Test
Factor analysis
Factor analysisAdapted GMM params
(statistical representation)
![Page 12: A small footprint for audio and music classification](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58a5a2f21a28ab1a628b4605/html5/thumbnails/12.jpg)
12
Different Factor Analysis approaches:
Adapted GMM mean
UBM mean
Eigenvoice subspace
Hidden vectorM = m + V y
Adapted GMM mean
UBM mean
Song subspace
residualM = m + Vy + Ux + Dz
Artist subspace
Adapted GMM mean
UBM mean
Low-rank matrix model both artist and song together
Hidden vector(i-vector)
M = m + T y
Eigenvoice FA:
Joint Factor Analysis (JFA):
I-vector FA:
![Page 13: A small footprint for audio and music classification](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58a5a2f21a28ab1a628b4605/html5/thumbnails/13.jpg)
13
An example of i-vector based systems
{I-vector extraction}{Cosine score,…}{MFCC}
Extractfeatures
Computestatistics
Extract i-vectors
Post-Processing
{LDA/WCCN/…}
feat
ure
s
Classification
![Page 14: A small footprint for audio and music classification](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58a5a2f21a28ab1a628b4605/html5/thumbnails/14.jpg)
14
Within-Class Covariance Normalization
Averaged i-vectors for class c
𝑖𝑡ℎ i-vectors from class c
Number of i-vectors from class cNumber of classes
WCCN projection matrix
Within-class covariance matrix
![Page 15: A small footprint for audio and music classification](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58a5a2f21a28ab1a628b4605/html5/thumbnails/15.jpg)
15
Within-Class Covariance Normalization
Class B
Class A
WCCN projection
The within-class variabilityIs reduced
![Page 16: A small footprint for audio and music classification](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58a5a2f21a28ab1a628b4605/html5/thumbnails/16.jpg)
Some results
16
![Page 17: A small footprint for audio and music classification](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58a5a2f21a28ab1a628b4605/html5/thumbnails/17.jpg)
• Audio Scene Classification
– DCASE-2016 challenge
– 15 different scenes (30 sec audios from: train, tram, office, outdoor, etc…)
– We won the challenge!!!
• Music Similarity
– GTZAN and 1517Artists
– Eval using genre
• Music Artist Recognition
– Artist20 and MSD
– Noise-robust MAR using 12 different kinds and levels of noise
17
Tasks
![Page 18: A small footprint for audio and music classification](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58a5a2f21a28ab1a628b4605/html5/thumbnails/18.jpg)
• Our approach: an i-vector DNN hybrid (4 submissions Among 49 participants)
– 1st place: hybrid
– 2nd place: i-vector
– 5th place: i-vector
– 14th place: DNN
18
Audio Scene Classification Challenge (𝐃𝐂𝐀𝐒𝐄 − 𝟐𝟎𝟏𝟔[𝟏])
[1] http://www.cs.tut.fi/sgn/arg/dcase2016/
![Page 19: A small footprint for audio and music classification](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58a5a2f21a28ab1a628b4605/html5/thumbnails/19.jpg)
• UBM trained on 1517Artists db, tested on GTZAN
• I-vectors are extracted unsupervised
• Evaluated with genre labels
19
Music Similarity [ISMIR-2015]
![Page 20: A small footprint for audio and music classification](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58a5a2f21a28ab1a628b4605/html5/thumbnails/20.jpg)
• Artist20 db– 20 artists
– 1413 songs
20
Music Artist Recognition [Eusipco-2015]
![Page 21: A small footprint for audio and music classification](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58a5a2f21a28ab1a628b4605/html5/thumbnails/21.jpg)
• MSD db– 50 Artists
– 5,000 songs
21
Music Artist Recognition [DAFx-2016]
CDB-Net
Experiment 2 – Raw i-vectors
![Page 22: A small footprint for audio and music classification](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58a5a2f21a28ab1a628b4605/html5/thumbnails/22.jpg)
• Artist20 db– 4 different noises :
• festival noise
• humming noise
• pink noise
• PUB noise
– 3 different SNR levels
22
Noise-Robust Music Artist Recognition [ISMIR-2016]
![Page 23: A small footprint for audio and music classification](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58a5a2f21a28ab1a628b4605/html5/thumbnails/23.jpg)
Conclusion
23
![Page 24: A small footprint for audio and music classification](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58a5a2f21a28ab1a628b4605/html5/thumbnails/24.jpg)
Conclusion:
• A small footprint using FA
• Useful for different audio and music related tasks
• Robustness against noise
• Useful as Neural Net features
24
![Page 25: A small footprint for audio and music classification](https://reader031.vdocuments.mx/reader031/viewer/2022020314/58a5a2f21a28ab1a628b4605/html5/thumbnails/25.jpg)
Thank you
for your time
25