music information retrieval system based on cascade classifiers presented by zbigniew w. ras ...

31
Music Information Retrieval System based on Cascade Classifiers presented by presented by Zbigniew W. Ras Zbigniew W. Ras www.kdd.uncc.e du http//:www.mir.uncc.edu CCI, UNC-Charlotte Research sponsored by NSF IIS-0414815, IIS-0968647

Upload: priscilla-bennett

Post on 04-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

Music Information Retrieval System based on Cascade Classifiers

presented bypresented by

Zbigniew W. RasZbigniew W. Ras

www.kdd.uncc.edu

http//:www.mir.uncc.edu

CCI, UNC-Charlotte

Research sponsored by NSFIIS-0414815, IIS-0968647

Page 2: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

Collaborators:

Alicja Wieczorkowska (Polish-Japanese Institute of IT, Warsaw, Poland)

Krzysztof Marasek (Polish-Japanese Institute of IT, Warsaw, Poland)

PhD students supported by two NSF Grants:

Elzbieta Kubera (Maria Curie-Sklodowska University, Lublin, Poland )

Rory Lewis (University of Colorado at Colorado Springs, USA)

Wenxin Jiang (Fred Hutchinson Cancer Research Center in Seattle, USA)

Xin Zhang (University of North Carolina, Pembroke, USA)

Jacek Grekow (Bialystok University of Technology, Poland)

Amanda Cohen-Mostafavi (InfoBelt LCC, Charlotte, USA)

Page 3: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

Outcome:Musical Database indexed by instruments.

MIRAI - Musical Database (mostly MUMS)[music pieces played by 57 different music instruments]

Goal: Design and Implement a System for Automatic Indexing of Music by Instruments

Page 4: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

Alto Flute, Bach-trumpet, bass-clarinet, bassoon, bass-trombone, Bb trumpet, b-flat clarinet, cello, cello-bowed, cello-martele, cello-muted, cello-pizzicato, contrabassclarinet, contrabassoon, crotales, c-trumpet, ctrumpet-harmonStemOut, doublebass-bowed, doublebass-martele, doublebass-muted, doublebass-pizzicato, eflatclarinet, electric-bass, electric-guitar, englishhorn, flute, frenchhorn, frenchHorn-muted, glockenspiel, marimba-crescendo, marimba-singlestroke, oboe, piano-9ft, piano-hamburg, piccolo, piccolo-flutter, saxophone-soprano, saxophone-tenor, steeldrums, symphonic, tenor-trombone, tenor-trombone-muted, tuba, tubular-bells, vibraphone-bowed, vibraphone-hardmallet, viola-bowed, viola-martele, viola-muted, viola-natural, viola-pizzicato, violin-artificial, violin-bowed, violin-ensemble, violin-muted, violin-natural-harmonics, xylophone.

MIRAI - Musical Database [music pieces played by 57+ different music instruments (see below)and described by over 910 attributes]

Page 5: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

What is needed & where is the problem?Database of monophonic and polyphonic music signals and their descriptions in terms of the standard MPEG7 featuresand new features (including temporal) . These signals are labeled by instruments forming additional feature called the decision feature.

Automatic Indexing of Polyphonic Music

Why is needed?To build classifiers for automatic indexing of musical sound by instruments.

Page 6: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

Automatic Indexing of Music

Page 7: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

………

MIRAI - Cooperative MIRAI - Cooperative MMusic usic IInformation nformation RRetrieval System based on etrieval System based on AAutomatic utomatic

IIndexingndexing

User

……

Instruments

QueryIndexed

Audio Database

QueryAdapter

Durations

EmptyAnswer?

Music Objects

Page 8: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

Feature Database

traditional pattern recognition

FeatureExtraction

lower level raw data

Higher level representations

classification clustering regression

Signal Data Sampling0.12s frame size0.04s hop size

manageable

Feature extractions

MATLAB

frame

Page 9: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

MPEG7 features MPEG7 features

Instantaneous Harmonic Spectral Centroid

Instantaneous Harmonic Spectral Deviation

Signal

Hamming Window

STFT

Signal envelope

FundamentalFrequency

Harmonic Peaks

Detection

Instantaneous Harmonic Spectral Spread

Temporal Centroid

Power Spectrum Spectral Centroid

Log Attack Time

Instantaneous Harmonic Spectral Variation

Hamming Window

STFT

NFFT FFT points

Page 10: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

Derived DatabaseDerived Database

MPEG7 features Non-MPEG7 features & new temporal features

Roll-Off

Flux

Mel frequency cepstral coefficients (MFCC)

Tristimulus and similar parameters (contents of odd and even

partials- Od, Ev)

Mean frequency deviation for low partials

Changing ratios of spectrum spread

Changing ratios of spectrum centroid

Spectrum Centroid

Spectrum Spread

Spectrum Flatness

Spectrum Basic Functions

Spectrum Projection Functions

Log Attack Time

Harmonic Peaks

……………..

Page 11: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

S’(i) = [S(i+1) – S(i)]/S(i) ; C’(i) = [C(i+1) – C(i)]/C(i) where S(i+1), S(i) and C(i+1), C(i) are the spectrum spread and spectrum centroid of two consecutive frames: frame

i+1 and frame i.

The changing ratios of spectrum spread and spectrum centroid for two consecutive frames are considered as the first derivatives of the spread and spectrum centroid.

Following the same method we calculate the second derivatives:

S’’(i) = [S’(i+1) – S’(i)]/S’(i) ; C’’(i) = [C’(i+1) – C’(i)]/C’(i)

New Temporal Features – S’(i), C’(i), S’’(i), C’’(i)

Remark: Sequence [S(i), S(i+1), S(i+2),….., S(i+k)] can be approximated by polynomialp(x)=a0+a1*x+a2*x2 + a3*x3 + ……… ; new features: a0, a1, a2, a3, ……

Page 12: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

Experiment Features

ClassifierConfidence

1 S, C Decision Tree 80.47%

2 S, C, S’ , C’ Decision Tree 83.68%

3 S, C, S’ , C’ , S’’ , C’’ Decision Tree 84.76%

4 S ,C KNN 80.31%

5 S, C, S’ , C’ KNN 84.07%

6 S, C, S’ , C’ , S’’ , C’’ KNN 85.51%

'S 'C'S 'C"S "C'S 'C'S 'C"S "C

Classification confidence with temporal features

Experiment with WEKA: 19 instruments [flute, piano, violin, saxophone, vibraphone, trumpet, marimba, french-horn, viola, basson, clarinet, cello, trombone, accordian, guitar, tuba, english-horn, oboe, double-bass], J48 with 0.25 confidence factor for pruning tree, minimum number of instances per leaf – 10; KNN – number of neighbors – 3Euclidean distance is used as similarity function.

Page 13: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

Confusion matrices: left is from Experiment 1, right is from Experiment 3. The correctly classified instances are highlighted in green and the incorrectly classified instances are highlighted in yellow

Page 14: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

Precision

00.10.20.30.40.50.60.70.80.9

1

Flute

Piano

Violin

Saxoph

one

Vibrap

hone

Trumpet

Mar

imba

Frenc

hhorn

Viola

Basso

on

Clarin

et

Cello

Trombon

e

Accor

dian

Guitar

Tuba

Englis

hHorn

Oboe

DoubleB

ass

A

B

C

Precision of the decision tree for each instrument

Recall

00.10.20.30.40.50.60.70.80.9

1

Flute

Piano

Violin

Saxoph

one

Vibrap

hone

Trumpet

Mar

imba

Frenc

hhorn

Viola

Basso

on

Clarin

et

Cello

Trombon

e

Accor

dian

Guitar

Tuba

Englis

hHorn

Oboe

DoubleB

ass

A

B

C

Recall of the decision tree for each instrument

F-Score

00.10.20.30.40.50.60.70.80.9

1

Flute

Piano

Violin

Saxoph

one

Vibrap

hone

Trumpet

Mar

imba

Frenc

hhorn

Viola

Basso

on

Clarin

et

Cello

Trombon

e

Accor

dian

Guitar

Tuba

Englis

hHorn

Oboe

DoubleB

ass

A

B

C

F-score of the decision tree for each instrument

Page 15: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

.

Polyphonic Sound

Polyphonic Sound

segmentatiosegmentationnsegmentatiosegmentationn Feature

extractionFeature

extraction

Classifier

Get Instrument

Sound separation

Polyphonic sounds – how to handle?

1.Single-label classification Based on Sound Separation

2.Multi-labeled classifiers

3.Training classifiers on polyphonic sounds ?

Get frame

Problems?

Information loss during the signal subtractionsubtraction

Sound Separation Flowchart

Page 16: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

Features Extractio

n

N Classifier

s

N Classifier

s

instrumeinstrumentnt

confidencconfidencee

Candidate Candidate 11

70%70%

Candidate Candidate 22

50%50%

.. ..

.. ..

.. ..

Candidate Candidate NN

10%10%

Multi-label classifier [collection of N classifiers]

instrumeinstrumentnt

confidencconfidencee

Candidate Candidate 11

70%70%

Candidate Candidate 22

50%50%

.. ..

.. ..

.. ..

Candidate Candidate NN

10%10%

instrumeinstrumentnt

confidencconfidencee

Candidate Candidate 11

70%70%

Candidate Candidate 22

50%50%

.. ..

.. ..

.. ..

Candidate Candidate NN

10%10%

1 second window

window segmentationwindow segmentation

frame – 0.12sframe – 0.12s

22 – frames with 0.04s hop size22 – frames with 0.04s hop sizeGet Get fraframeme

N – number of instruments

85%80%70%55%45%

16%12%……

Page 17: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

Schema I - Hornbostel Sachs Schema I - Hornbostel Sachs

Aerophone ChordophoneMembranophone Idiophone

FreeSingle Reed SideLip Vibration

Whip

Alto Flute

FluteC Trumpet

French Horn

Tuba

Oboe

Bassoon

Page 18: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

Schema II - Play Methods

Muted PizzicatoBowed Picked

PiccoloFlute BassoonAlto Flute

ShakenBlow ……

……

Page 19: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

Instrument granularity classifiers which are trained at each level of the

hierarchical tree

Hornbostel/Sachs

We do not include membranophones because instruments in this family usuallydo not produce harmonic sound so that they need special techniques to be identified

Page 20: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

Modules of cascade classifier for single instrument estimation --- Hornboch /Sachs

Pitch 3B

91.80%

96.02%

98.94%

= 95.00%

*

>

Page 21: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

HIERARCHICAL STRUCTURE BUILT BY CLUSTERING ANALYSIS

Seven common method to calculate the distance or similarity between clusters: single linkage (nearest neighbor), complete linkage (furthest neighbor), unweighted pair-group method using arithmetic averages (UPGMA), weighted pair-group method using arithmetic averages (WPGMA), unweighted pair-group method using the centroid average (UPGMC), weighted pair-group method using the centroid average (WPGMC), Ward's method.

Six most common distance functions: Euclidean, Manhattan, Canberra (examines the sum of series of a fraction differences between coordinates of a pair of objects), Pearson correlation coefficient (PCC) – measures the degree of association between objects, Spearman's rank correlation coefficient, Kendal (counts the number of pairwise disagreements between two lists)

Clustering algorithm – HCLUST (Agglomerative hierarchical clustering) – R Package

Page 22: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

Clustering result from Hclust algorithm with Ward linkage method and Pearson distance measure; Flatness coefficients are used as the selected feature

“ctrumpet” and “batchtrumpet” are clustered in the same group. “ctrumpet_harmonStemOut” is clustered in one single group instead of merging with “ctrumpet”. Bassoon is considered as the sibling of the regular French horn. “French horn muted” is clustered in another different group together with “English Horn” and “Oboe” .

Page 23: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

Exp# Classifier Method Recall

Precision F-Score

1 Non-CascadeSingle-label based on sound separation 31.48% 43.06% 36.37%

2 Non_Cascade multi-label classification 85.51% 55.04% 66.97%

3 Cascade (Hornbostel) multi-label classification 64.49% 63.10% 63.79%

4 Cascade (Playmethod) multi-label classification 66.67% 55.25% 60.43%

5 Cascade (Machine Learned) multi-label classification 63.77% 69.67% 66.59%

Looking for optimal [classification method data representation] in polyphonic music

Testing Data: 49 polyphonic sounds are created by selecting three different single instrument sounds from the training database and mixing them together.

KNN (k=3) is used as the classifier for each experiment.

Page 24: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

Auto indexing system for musical Auto indexing system for musical instrumentsinstruments

intelligent query answering system intelligent query answering system for music instruments for music instruments

WWW.MIR.UNCC.EDU

Page 25: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte
Page 26: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

User entering query

User is not satisfied and he is entering a new query

- Action Rules System

Page 27: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

Action RuleAction Rule

Action rule is defined as a term

A B D

a1 b2 d1

a2 b2

a2 b2 d2

Information System

conjunction of fixed condition features shared by both groups

proposed changes in values of flexible features

desired effect of the action

[(ω) ∧ (α → β)] →(ϕ→ψ)

Page 28: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

Action Rules Discovery

Meta-actions based decision system S(d)=(X,A{d}, V ), with A= {A1,A2,…,Am}

A1 A2 A3 A4 ….. Am

M1 E11 E12 E13 E14 E1m

M2 E21 E22 E23 E24 E2m

M3 E31 E32 E33 E34 E3m

M4 E41 E42 E43 E44 E4m

…..

Mn Em1 Em2 Em3 Em4 Emn

Influence Matrix

r = [(A1 , a1 a1’) (A2 , a2 a2’) (A4 , a4 a4’)]) (d , d1 d1’)Candidate action rule -

if E32 = [a2 a2’], then E31 = [a1 a1’], E34 = [a4 a4’]

Rule r is supported & covered by M3

Page 29: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

"Action Rules Discovery without pre-existing classification rules", Z.W. Ras, A. Dardzinska, Proceedings of RSCTC 2008 Conference, in Akron, Ohio, LNAI 5306, Springer, 2008, 181-190 http://www.cs.uncc.edu/~ras/Papers/Ras-Aga-AKRON.pdf

Page 30: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

Since the window diminishes the signal on both edges, it leads to information loss due to the narrowing of frequency spectrum. In order to preserve this information, those consecutive analysis frames have overlap in time. The empirical experiments show the best overlap is two third of window size

Time

A B AA A A

Page 31: Music Information Retrieval System based on Cascade Classifiers presented by Zbigniew W. Ras  http//: CCI, UNC-Charlotte

Windowing

Hamming window spectral leakage