a small footprint for audio and music classification

A small footprint foraudio and music classification

Hamid Eghbal-zadeh

1

Outline

1. Introduction

2. I-Vector representation

3. Some results

4. Conclusion

2

INTRODUCTION

3

A small footprint for Audio and Music classification

4

𝑎1𝑎2

𝑎𝑛

.

.

.

Audio Acoustic features Front-end Small footprint Classifier

o Front-end:• Block-level features (Genre classification) [Seyerlehner,2010]• Adapted GMM means (Genre classification) [Charbuillet,2011]• Adapted RBM weights (Speaker verification) [Ghahabi,2014]• Factor Analysis (Artist recognition) [Eghbal-zadeh, 2015]

Signal processing

Machine learning

Machine learning

Machine learning

5

• Block-level features (Genre classification) [Seyerlehner,2010]• Adapted GMM means (Genre classification) [Charbuillet,2011]• Adapted RBM weights (Speaker verification) [Ghahabi,2014]• Factor Analysis (Artist recognition) [Eghbal-zadeh, 2015]

Signal processing

Machine learning

Machine learning

Dev:

Train:

Test:

Machine learning

Dev db Universal Background Model(UBM)

Train db

+

UBM Adaptation Adapted UBM params

Classifier

Train

Test db

+


Classifier

Test

Train

6


Signal processing

Machine learning

Machine learning

Dev:

Train:

Test:

Machine learning

Dev db Universal Background Model(UBM)

Train db

+

UBM AdaptationAdapted UBM

paramsClassifier

Train

Test db

+


Classifier

Test

Train

Train

Test

Factor analysis

Factor analysis

Effect of Factor Analysis step

7

An example of songs in GTZAN dataset from 3 genres [Eghbal-zadeh, ISMIR2015]:Right: without Factor AnalysisLeft: With Factor Analysis

Artist recognition performance on Artist20 with and Without Factor Analysis [Eghbal-zadeh, Eusipco2015]

Without FA

With FA

8

Other benefits:

• Noise-Robust features [Eghbal-zadeh,ISMIR2016]

• Combined with Neural Nets [Eghbal-zadeh, DAFx2016]

• Successfully used in different tasks:• Speaker verification• Language recognition• Artist recognition• Music similarity• Audio scene classification

Why to apply Factor Analysis?

• They provide an information-rich, fixed-length, low-dimensional representation

• They have a single-Gaussian distribution• We can use the properties of Gaussians

• They can be easily scored• Using cosine distance

• They are the estimated latent factors with a good discrimination power resulted from a Factor Analysis procedure

9

I-VECTOR

REPRESENTATION AS

A SMALLFOOTPRINT

10

11


Signal processing

Machine learning

Machine learning

Dev:

Train:

Test:

Machine learning

Dev db UBM (GMM)

Train db

+

UBMAdapted GMM params

(statistical representation)Classifier

Train

Test db

+

UBM Classifier

Test

Train

Train

Test

Factor analysis

Factor analysisAdapted GMM params

(statistical representation)

12

Different Factor Analysis approaches:

Adapted GMM mean

UBM mean

Eigenvoice subspace

Hidden vectorM = m + V y

Adapted GMM mean

UBM mean

Song subspace

residualM = m + Vy + Ux + Dz

Artist subspace

Adapted GMM mean

UBM mean

Low-rank matrix model both artist and song together

Hidden vector(i-vector)

M = m + T y

Eigenvoice FA:

Joint Factor Analysis (JFA):

I-vector FA:

13

An example of i-vector based systems

{I-vector extraction}{Cosine score,…}{MFCC}

Extractfeatures

Computestatistics

Extract i-vectors

Post-Processing

{LDA/WCCN/…}

feat

ure

s

Classification

14

Within-Class Covariance Normalization

Averaged i-vectors for class c

𝑖𝑡ℎ i-vectors from class c

Number of i-vectors from class cNumber of classes

WCCN projection matrix

Within-class covariance matrix

15

Within-Class Covariance Normalization

Class B

Class A

WCCN projection

The within-class variabilityIs reduced

Some results

16

• Audio Scene Classification

– DCASE-2016 challenge

– 15 different scenes (30 sec audios from: train, tram, office, outdoor, etc…)

– We won the challenge!!!

• Music Similarity

– GTZAN and 1517Artists

– Eval using genre

• Music Artist Recognition

– Artist20 and MSD

– Noise-robust MAR using 12 different kinds and levels of noise

17

Tasks

• Our approach: an i-vector DNN hybrid (4 submissions Among 49 participants)

– 1st place: hybrid

– 2nd place: i-vector

– 5th place: i-vector

– 14th place: DNN

18

Audio Scene Classification Challenge (𝐃𝐂𝐀𝐒𝐄 − 𝟐𝟎𝟏𝟔[𝟏])

[1] http://www.cs.tut.fi/sgn/arg/dcase2016/

• UBM trained on 1517Artists db, tested on GTZAN

• I-vectors are extracted unsupervised

• Evaluated with genre labels

19

Music Similarity [ISMIR-2015]

• Artist20 db– 20 artists

– 1413 songs

20

Music Artist Recognition [Eusipco-2015]

• MSD db– 50 Artists

– 5,000 songs

21

Music Artist Recognition [DAFx-2016]

CDB-Net

Experiment 2 – Raw i-vectors

• Artist20 db– 4 different noises :

• festival noise

• humming noise

• pink noise

• PUB noise

– 3 different SNR levels

22

Noise-Robust Music Artist Recognition [ISMIR-2016]

Conclusion

23

Conclusion:

• A small footprint using FA

• Useful for different audio and music related tasks

• Robustness against noise

• Useful as Neural Net features

24

Thank you

for your time

25

a small footprint for audio and music classification

Technology