music information retrieval system based on cascade classifiers presented by zbigniew w. ras ...
TRANSCRIPT
Music Information Retrieval System based on Cascade Classifiers
presented bypresented by
Zbigniew W. RasZbigniew W. Ras
www.kdd.uncc.edu
http//:www.mir.uncc.edu
CCI, UNC-Charlotte
Research sponsored by NSFIIS-0414815, IIS-0968647
Collaborators:
Alicja Wieczorkowska (Polish-Japanese Institute of IT, Warsaw, Poland)
Krzysztof Marasek (Polish-Japanese Institute of IT, Warsaw, Poland)
PhD students supported by two NSF Grants:
Elzbieta Kubera (Maria Curie-Sklodowska University, Lublin, Poland )
Rory Lewis (University of Colorado at Colorado Springs, USA)
Wenxin Jiang (Fred Hutchinson Cancer Research Center in Seattle, USA)
Xin Zhang (University of North Carolina, Pembroke, USA)
Jacek Grekow (Bialystok University of Technology, Poland)
Amanda Cohen-Mostafavi (InfoBelt LCC, Charlotte, USA)
Outcome:Musical Database indexed by instruments.
MIRAI - Musical Database (mostly MUMS)[music pieces played by 57 different music instruments]
Goal: Design and Implement a System for Automatic Indexing of Music by Instruments
Alto Flute, Bach-trumpet, bass-clarinet, bassoon, bass-trombone, Bb trumpet, b-flat clarinet, cello, cello-bowed, cello-martele, cello-muted, cello-pizzicato, contrabassclarinet, contrabassoon, crotales, c-trumpet, ctrumpet-harmonStemOut, doublebass-bowed, doublebass-martele, doublebass-muted, doublebass-pizzicato, eflatclarinet, electric-bass, electric-guitar, englishhorn, flute, frenchhorn, frenchHorn-muted, glockenspiel, marimba-crescendo, marimba-singlestroke, oboe, piano-9ft, piano-hamburg, piccolo, piccolo-flutter, saxophone-soprano, saxophone-tenor, steeldrums, symphonic, tenor-trombone, tenor-trombone-muted, tuba, tubular-bells, vibraphone-bowed, vibraphone-hardmallet, viola-bowed, viola-martele, viola-muted, viola-natural, viola-pizzicato, violin-artificial, violin-bowed, violin-ensemble, violin-muted, violin-natural-harmonics, xylophone.
MIRAI - Musical Database [music pieces played by 57+ different music instruments (see below)and described by over 910 attributes]
What is needed & where is the problem?Database of monophonic and polyphonic music signals and their descriptions in terms of the standard MPEG7 featuresand new features (including temporal) . These signals are labeled by instruments forming additional feature called the decision feature.
Automatic Indexing of Polyphonic Music
Why is needed?To build classifiers for automatic indexing of musical sound by instruments.
Automatic Indexing of Music
………
MIRAI - Cooperative MIRAI - Cooperative MMusic usic IInformation nformation RRetrieval System based on etrieval System based on AAutomatic utomatic
IIndexingndexing
User
……
…
Instruments
…
QueryIndexed
Audio Database
QueryAdapter
Durations
EmptyAnswer?
Music Objects
Feature Database
traditional pattern recognition
FeatureExtraction
lower level raw data
Higher level representations
classification clustering regression
Signal Data Sampling0.12s frame size0.04s hop size
manageable
Feature extractions
MATLAB
frame
MPEG7 features MPEG7 features
Instantaneous Harmonic Spectral Centroid
Instantaneous Harmonic Spectral Deviation
Signal
Hamming Window
STFT
Signal envelope
FundamentalFrequency
Harmonic Peaks
Detection
Instantaneous Harmonic Spectral Spread
Temporal Centroid
Power Spectrum Spectral Centroid
Log Attack Time
Instantaneous Harmonic Spectral Variation
Hamming Window
STFT
NFFT FFT points
Derived DatabaseDerived Database
MPEG7 features Non-MPEG7 features & new temporal features
Roll-Off
Flux
Mel frequency cepstral coefficients (MFCC)
Tristimulus and similar parameters (contents of odd and even
partials- Od, Ev)
Mean frequency deviation for low partials
Changing ratios of spectrum spread
Changing ratios of spectrum centroid
Spectrum Centroid
Spectrum Spread
Spectrum Flatness
Spectrum Basic Functions
Spectrum Projection Functions
Log Attack Time
Harmonic Peaks
……………..
S’(i) = [S(i+1) – S(i)]/S(i) ; C’(i) = [C(i+1) – C(i)]/C(i) where S(i+1), S(i) and C(i+1), C(i) are the spectrum spread and spectrum centroid of two consecutive frames: frame
i+1 and frame i.
The changing ratios of spectrum spread and spectrum centroid for two consecutive frames are considered as the first derivatives of the spread and spectrum centroid.
Following the same method we calculate the second derivatives:
S’’(i) = [S’(i+1) – S’(i)]/S’(i) ; C’’(i) = [C’(i+1) – C’(i)]/C’(i)
New Temporal Features – S’(i), C’(i), S’’(i), C’’(i)
Remark: Sequence [S(i), S(i+1), S(i+2),….., S(i+k)] can be approximated by polynomialp(x)=a0+a1*x+a2*x2 + a3*x3 + ……… ; new features: a0, a1, a2, a3, ……
Experiment Features
ClassifierConfidence
1 S, C Decision Tree 80.47%
2 S, C, S’ , C’ Decision Tree 83.68%
3 S, C, S’ , C’ , S’’ , C’’ Decision Tree 84.76%
4 S ,C KNN 80.31%
5 S, C, S’ , C’ KNN 84.07%
6 S, C, S’ , C’ , S’’ , C’’ KNN 85.51%
'S 'C'S 'C"S "C'S 'C'S 'C"S "C
Classification confidence with temporal features
Experiment with WEKA: 19 instruments [flute, piano, violin, saxophone, vibraphone, trumpet, marimba, french-horn, viola, basson, clarinet, cello, trombone, accordian, guitar, tuba, english-horn, oboe, double-bass], J48 with 0.25 confidence factor for pruning tree, minimum number of instances per leaf – 10; KNN – number of neighbors – 3Euclidean distance is used as similarity function.
Confusion matrices: left is from Experiment 1, right is from Experiment 3. The correctly classified instances are highlighted in green and the incorrectly classified instances are highlighted in yellow
Precision
00.10.20.30.40.50.60.70.80.9
1
Flute
Piano
Violin
Saxoph
one
Vibrap
hone
Trumpet
Mar
imba
Frenc
hhorn
Viola
Basso
on
Clarin
et
Cello
Trombon
e
Accor
dian
Guitar
Tuba
Englis
hHorn
Oboe
DoubleB
ass
A
B
C
Precision of the decision tree for each instrument
Recall
00.10.20.30.40.50.60.70.80.9
1
Flute
Piano
Violin
Saxoph
one
Vibrap
hone
Trumpet
Mar
imba
Frenc
hhorn
Viola
Basso
on
Clarin
et
Cello
Trombon
e
Accor
dian
Guitar
Tuba
Englis
hHorn
Oboe
DoubleB
ass
A
B
C
Recall of the decision tree for each instrument
F-Score
00.10.20.30.40.50.60.70.80.9
1
Flute
Piano
Violin
Saxoph
one
Vibrap
hone
Trumpet
Mar
imba
Frenc
hhorn
Viola
Basso
on
Clarin
et
Cello
Trombon
e
Accor
dian
Guitar
Tuba
Englis
hHorn
Oboe
DoubleB
ass
A
B
C
F-score of the decision tree for each instrument
.
Polyphonic Sound
Polyphonic Sound
segmentatiosegmentationnsegmentatiosegmentationn Feature
extractionFeature
extraction
Classifier
Get Instrument
Sound separation
Polyphonic sounds – how to handle?
1.Single-label classification Based on Sound Separation
2.Multi-labeled classifiers
3.Training classifiers on polyphonic sounds ?
Get frame
Problems?
Information loss during the signal subtractionsubtraction
Sound Separation Flowchart
Features Extractio
n
N Classifier
s
N Classifier
s
instrumeinstrumentnt
confidencconfidencee
Candidate Candidate 11
70%70%
Candidate Candidate 22
50%50%
.. ..
.. ..
.. ..
Candidate Candidate NN
10%10%
Multi-label classifier [collection of N classifiers]
instrumeinstrumentnt
confidencconfidencee
Candidate Candidate 11
70%70%
Candidate Candidate 22
50%50%
.. ..
.. ..
.. ..
Candidate Candidate NN
10%10%
instrumeinstrumentnt
confidencconfidencee
Candidate Candidate 11
70%70%
Candidate Candidate 22
50%50%
.. ..
.. ..
.. ..
Candidate Candidate NN
10%10%
1 second window
window segmentationwindow segmentation
frame – 0.12sframe – 0.12s
22 – frames with 0.04s hop size22 – frames with 0.04s hop sizeGet Get fraframeme
N – number of instruments
85%80%70%55%45%
16%12%……
Schema I - Hornbostel Sachs Schema I - Hornbostel Sachs
Aerophone ChordophoneMembranophone Idiophone
FreeSingle Reed SideLip Vibration
Whip
Alto Flute
FluteC Trumpet
French Horn
Tuba
Oboe
Bassoon
Schema II - Play Methods
Muted PizzicatoBowed Picked
PiccoloFlute BassoonAlto Flute
ShakenBlow ……
……
Instrument granularity classifiers which are trained at each level of the
hierarchical tree
Hornbostel/Sachs
We do not include membranophones because instruments in this family usuallydo not produce harmonic sound so that they need special techniques to be identified
Modules of cascade classifier for single instrument estimation --- Hornboch /Sachs
Pitch 3B
91.80%
96.02%
98.94%
= 95.00%
*
>
HIERARCHICAL STRUCTURE BUILT BY CLUSTERING ANALYSIS
Seven common method to calculate the distance or similarity between clusters: single linkage (nearest neighbor), complete linkage (furthest neighbor), unweighted pair-group method using arithmetic averages (UPGMA), weighted pair-group method using arithmetic averages (WPGMA), unweighted pair-group method using the centroid average (UPGMC), weighted pair-group method using the centroid average (WPGMC), Ward's method.
Six most common distance functions: Euclidean, Manhattan, Canberra (examines the sum of series of a fraction differences between coordinates of a pair of objects), Pearson correlation coefficient (PCC) – measures the degree of association between objects, Spearman's rank correlation coefficient, Kendal (counts the number of pairwise disagreements between two lists)
Clustering algorithm – HCLUST (Agglomerative hierarchical clustering) – R Package
Clustering result from Hclust algorithm with Ward linkage method and Pearson distance measure; Flatness coefficients are used as the selected feature
“ctrumpet” and “batchtrumpet” are clustered in the same group. “ctrumpet_harmonStemOut” is clustered in one single group instead of merging with “ctrumpet”. Bassoon is considered as the sibling of the regular French horn. “French horn muted” is clustered in another different group together with “English Horn” and “Oboe” .
Exp# Classifier Method Recall
Precision F-Score
1 Non-CascadeSingle-label based on sound separation 31.48% 43.06% 36.37%
2 Non_Cascade multi-label classification 85.51% 55.04% 66.97%
3 Cascade (Hornbostel) multi-label classification 64.49% 63.10% 63.79%
4 Cascade (Playmethod) multi-label classification 66.67% 55.25% 60.43%
5 Cascade (Machine Learned) multi-label classification 63.77% 69.67% 66.59%
Looking for optimal [classification method data representation] in polyphonic music
Testing Data: 49 polyphonic sounds are created by selecting three different single instrument sounds from the training database and mixing them together.
KNN (k=3) is used as the classifier for each experiment.
Auto indexing system for musical Auto indexing system for musical instrumentsinstruments
intelligent query answering system intelligent query answering system for music instruments for music instruments
WWW.MIR.UNCC.EDU
User entering query
User is not satisfied and he is entering a new query
- Action Rules System
Action RuleAction Rule
Action rule is defined as a term
A B D
a1 b2 d1
a2 b2
a2 b2 d2
Information System
conjunction of fixed condition features shared by both groups
proposed changes in values of flexible features
desired effect of the action
[(ω) ∧ (α → β)] →(ϕ→ψ)
Action Rules Discovery
Meta-actions based decision system S(d)=(X,A{d}, V ), with A= {A1,A2,…,Am}
A1 A2 A3 A4 ….. Am
M1 E11 E12 E13 E14 E1m
M2 E21 E22 E23 E24 E2m
M3 E31 E32 E33 E34 E3m
M4 E41 E42 E43 E44 E4m
…..
Mn Em1 Em2 Em3 Em4 Emn
Influence Matrix
r = [(A1 , a1 a1’) (A2 , a2 a2’) (A4 , a4 a4’)]) (d , d1 d1’)Candidate action rule -
if E32 = [a2 a2’], then E31 = [a1 a1’], E34 = [a4 a4’]
Rule r is supported & covered by M3
"Action Rules Discovery without pre-existing classification rules", Z.W. Ras, A. Dardzinska, Proceedings of RSCTC 2008 Conference, in Akron, Ohio, LNAI 5306, Springer, 2008, 181-190 http://www.cs.uncc.edu/~ras/Papers/Ras-Aga-AKRON.pdf
Since the window diminishes the signal on both edges, it leads to information loss due to the narrowing of frequency spectrum. In order to preserve this information, those consecutive analysis frames have overlap in time. The empirical experiments show the best overlap is two third of window size
Time
A B AA A A
Windowing
Hamming window spectral leakage