harmonically informed multi-pitch tracking zhiyao duan, jinyu han and bryan pardo eecs dept.,...
TRANSCRIPT
Harmonically Informed Multi-pitch Tracking
Zhiyao Duan, Jinyu Han and Bryan PardoEECS Dept., Northwestern Univ.
Interactive Audio Lab, http://music.cs.northwestern.edu
For presentation in ISMIR 2009, Kobe, Japan.
• Given polyphonic music played by several monophonic harmonic instruments
• Estimate a pitch trajectory for each instrument
The Multi-pitch Tracking Task
2Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
Potential Applications
• Automatic music transcription• Harmonic source separation• Other applications
– Melody-based music search– Chord recognition– Music education– ……
3Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
The 2-stage Standard Approach
• Stage 1: Multi-pitch Estimation (MPE) in each single frame
• Stage 2: Connect pitch estimates across frames into pitch trajectories
4Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
…
Time
Fre
quen
cy
State of the Art
5Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
• How far has existing work gone?– MPE is not very robust– Form short pitch trajectories (within a
note) according to local time-frequency proximity of pitch estimates
• Our contribution– A new MPE algorithm– A constrained clustering approach to
estimate pitch trajectories across multiple notes
System Overview
6Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
0 10 20 30 40
40
50
60
70
80
90
Time (second)
Fre
quency (
MID
I num
ber)
0 10 20 30 40
40
50
60
70
80
90
Time (second)
Fre
que
ncy (
MID
I n
um
be
r)
0 10 20 30 40-1
0
1
Time (second)
Am
plit
ud
e
Frequency
Am
plitu
de
Multi-pitch Estimation in Single Frame
• A maximum likelihood estimation method
• Spectrum: peaks & the non-peak region
Best F0 estimate (a set of F0s)
Observed power spectrum
F0 hypothesis, (a set of F0s)
7Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
True F0True F0
Likelihood Definition
Likelihood of observing these peaks
Likelihood of not having any harmonics in the NP region
8Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
F0 Hyp F0 Hyp
is large is small
is large is small
Likelihood Definition
Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu 9
True F0
F0 Hyp
Likelihood of observing these peaks
Likelihood of not having any harmonics in the NP region
is large is large
Pitch Trajectory Formation
10
Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
• How to form pitch trajectories ?– View it as a constrained clustering problem!
• We use two clustering cues– Global timbre consistency– Local time-frequency locality
Global Timbre Consistency
• Objective function– Minimize intra-cluster distance
• Harmonic structure feature– Normalized relative amplitudes of harmonics
Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu 111st Component
2nd
Com
pone
nt
PCA of Harmonic Structures
ViolinClarinetSaxophoneBassoon
0 10 20 30 40 500
20
40
60
80
100
Harmonic number
Am
plitu
de (
dB)
Harmonic Structure
Local Time-frequency Locality
• Constraints– Must-link: similar pitches in adjacent frames– Cannot-link: simultaneous pitches
• Finding a feasible clustering is NP-hard!
Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu 12
…
Time
Fre
quen
cy
Our Constrained Clustering Process
• 1) Find an initial clustering– Labeling pitches according to pitch order in
each frame: First, second, third, fourth
Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu 13
…
Time
Fre
quen
cy
…
Time
Fre
quen
cy
Our Constrained Clustering Process
• 2) Define constraints– Must-link: similar pitches in adjacent
frames and the same initial cluster: Notelet
– Cannot-link: simultaneous notelets
Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu 14
…
Time
Fre
quen
cy
Our Constrained Clustering Process• 3) Update clusters to minimize objective
function– Swap set: A set of notelets in two clusters
connected by cannot-links– Swap notelets in a swap set between clusters if
it reduces objective function– Iteratively traverse all the swap sets
Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu 15
…
Time
Fre
quen
cy
Data Set
• Data set– 10 J.S. Bach chorales (quartets, played by
violin, clarinet, saxophone and bassoon)– Each instrument is recorded individually, then
mixed
• Ground-truth pitch trajectories– Use YIN on monophonic tracks before mixing
16Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
Experimental Results
17Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
Mean +- Std Precision (%) Recall (%)
How many pitches are correctly estimated?
Klapuri, ISMIR2006
87.2 +- 2.0 66.2 +- 3.4
Ours 88.6 +- 1.7 77.0 +- 3.5
How many pitches are correctly estimated and put into the correct trajectory?
Chance Approx 0.0 Approx 0.0
Ours 76.9 +- 11.0 67.1 +- 11.9
How many notes are correctly estimated?
Chance Approx 0.0 Approx 0.0
Ours 46.0 +- 5.5 54.3 +- 5.5
Ground Truth Pitch Trajectories
18Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
J.S. Bach, “Ach lieben Christen, seid getrost”
Our System’s Output
19Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
J.S. Bach, “Ach lieben Christen, seid getrost”
Conclusion
• Our multi-pitch tracking system– Multi-pitch estimation in single frame
• Estimate F0s by modeling peaks and the non-peak region
• Estimate polyphony, refine F0s estimates
– Pitch trajectory formation• Constrained clustering
– Objective: timbre (harmonic structure) consistency– Constraints: local time-frequency locality of pitches
• A clustering algorithm by swapping labels
• Results on music recordings are promising
20Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
Possible Questions
• How much does our constrained clustering algorithm improve from the initial pitch trajectory (label pitches by pitch order)?
22Northwestern University, Interactive Audio Lab. http://music.cs.northwestern.edu
1 2 3 40
20
40
60
80
100
Track No.
Acc
urac
y (%
)
Initial trajectoryFinal trajectory