automatic transcription of polyphonic piano music using a note masking technique
DESCRIPTION
Automatic transcription of polyphonic piano music using a note masking technique. Mr Ronan Kelly and Dr Jacqueline Walker Department of Electronic & Computer Engineering University of Limerick [email protected] , [email protected]. Overview. Music transcription Our approach - PowerPoint PPT PresentationTRANSCRIPT
Automatic transcription of polyphonic piano music using a note masking
technique
Mr Ronan Kelly and Dr Jacqueline Walker
Department of Electronic & Computer Engineering
University of Limerick
Overview
• Music transcription
• Our approach
• Onset detection
• Algorithm
• Results
• Conclusions
Music Transcription
• Complex cognitive task
Example: Top of the Pops!
• A challenging task for a computer but one which pushes boundaries of signal processing, pattern recognition, machine learning,….
Monophonic Music Transcription
• A solved problem– Sliding window-based analysis of melody
line– Steps – decimate – reduce data– Onset detecton– FFT or constant Q transform– Note detection
Polyphonic Music Transcription
• Multiple simultaneous notes
• In Western Tonal Music (WTM), notes played together almost inevitably share harmonics
• Impact of rhythms, held notes
• Possibility of multiple instruments
Approaches to Polyphonic Transcription
• Human audition based– Martin Cooke’s “Modelling Auditory Processing and
Organisation”, 1993– Brown & Cooke, “Computational Auditory Scene Analysis”,
1994
• Signal processing based– Tanguiane “Artificial Perception and Music
Recognition”, 1993
– Klapuri et al, since 1998
Our Approach
• Onset Detection
• Note Window & FFT
• Masking Scheme Iteration
Onset Detection
• NAE (Note Average Energy) Onset detection1.
1. (Liu, R., Griffith J., Walker, J. & Murphy, P., TIME DOMAIN NOTE AVERAGE ENERGY BASED MUSIC ONSET DETECTION, Proceedings of the Stockholm Music Acoustics Conference, August 6-9, 2003 (SMAC 03), Stockholm, Sweden
Figure 3 Energy (b), averaged energy (c) and note average energy (d) of power envelope (a).
Power Envelope p(t)
Energy e(t)
Note Average Energy NAE(t)
Average Energy a(t)
(a)
(b)
(c)
(d)
In practice, we search for local minima…
,1
)( t
tn
)( 1nn tttdt)(tp
tttNAE
n
Note Window• FFT performed on the whole note• Avoids start-of-note and end-of-note effects• Gives greater robustness against noise
Algorithm for Masking Scheme - 1
Continue until no peaks above threshold
FFT on note window
Find max peak in window
Remove peak from window; add to list
Algorithm for Masking Scheme - 2
Continue until list is empty
Apply mask to first (lowest) frequency in list
Adjust amplitudes of all affected frequencies by mask
Add frequency to note list; move to next frequency
Masking Scheme - 1
C4, E4, G4
Max. peak amplitude = 29.9 @ 392 Hz (G4)
262 Hz, 330 Hz, 392 Hz
Next peak amplitude = 21.4 @ 330 Hz
Masking Scheme - 2
05
1015202530
Amplitude
262 330 392 523
Frequency (Hz)
Detected frequency peaksFrequency (Hz) Amplitude
262 11.2
330 21.4
392 29.9
523 7.1
0
0.2
0.4
0.6
0.8
1
Amplitude
261 523 784
Frequency (Hz)
Frequency (Hz) Amplitude
260,261,262 100%
523,524 72%
784,785 41%
Note mask
Masking Scheme - 3
0
5
10
15
20
25
30
Amplitude
262 330 392 523
Frequency (Hz)
C4 Mask
Values Detected
Masking action
0 5
10 15 20 25 30
Amplitude
262 330 392 523 Frequency (Hz)
Remaining detected values
Frequency (Hz) Amplitude
330 21.4
392 29.9
523 3.1
After masking
Note played: C4
Building a Note Mask - 1
A note is played with other notes and the significant frequency peaks and amplitudes recorded:
harmonics of D4 in red
D4 harmonics in common in blue
Building a Note Mask - 2
05
101520253035404550
Amplitude
262 523 785 1047 1309 1570 1832
Frequency (Hz)
D4 Values
C4 Values
0
5
10 15 20
25 30 35
Amplitude
294 587 1174 1469 2056
Frequency (Hz)
D4 Values A4 Values D4 + A4 values
D4 and C4 D4 and A4
Building a Note Mask - 3
Frequency (Hz)
D4, C4 D4, E4 D4, F4 D4, G4 D4, A4 D4, B4
294 1 1 1 1 1 1
587 0.70 0.67 0.76 0.75 0.84 0.65
881 0.38 0.37 0.44 0.44 0.40
1175 0.11 0.12
1468 0.17 0.16 0.15 0.17 0.14
1762 0.12 0.11 0.12
2056 0.27 0.25 0.28 0.28 0.30 0.18
Extract values unique to D4 and normalise to amplitude of highest peak:
Building a Note Mask - 3
Average across samples:
0102030405060708090
100
Amplitude % of
Fundamental Frequency
294 587 881 1175 1468 1762 2056
Frequency
D4 Mask
Frequency (Hz) Amplitude
294 100%
587 72.69%
881 40.63%
1175 11.49%
1468 15.93%
1762 11.61%
2056 26.03%
Experimental Set-up
• Keyboard used: Technics KN800 PCM Keyboard
• Note range: C2 to B6
• Recording – direct using line-in
• Isolated chords and polyphonic music samples
Results
How to define error?
Need to account for both missed notes (m) and spurious notes (x)
%n
xmE% 100
n is number of notes detected – not number of notes played
Results – Isolated Chords
Notes Played
Notes detected
Missed notes Spurious notes
Total Error (%)
Chords
5-8 notes
243 225 18 0 8.0
Chords
3-4 notes
648 638 15 5 3.1
Chords 1898 1906 69 77 7.7
Results – Polyphonic Music
Notes played
Notes detected
Missed notes
Spurious notes
Total Error (%)
Danny Boy
(slow)
87 94 7 14 22
Danny Boy
(moderate)
91 98 8 15 23.5
Danny Boy
(fast)
90 99 8 17 25
Effect of Onset Detection
• Effective onset detection is crucial• Two types of errors:
Extra onset
less likely to cause a problem
but, … note divided up too finely
Missing onset
note windows not placed ‘correctly’
Results with Onset Detection
Notes played
Notes detected
Missed notes
Spurious notes
Total Error (%)
Danny Boy
(slow) 87 120 10 43 44
Danny Boy
(moderate)91 120 17 28 44
Danny Boy
(fast)90 120 23 37 58
Future Work
• Develop model for note combinations (polyphonic note masks)
• Use wider range of note combinations
• Develop an efficient approach to applying polyphonic note masks
• Improve note onset detection