elen e4896 music signal processing lecture …dpwe/e4896/lectures/e4896-l...e4896 music signal...
TRANSCRIPT
E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - /18
Lecture 8:Pitch Tracking
Dan EllisDept. Electrical Engineering, Columbia University
[email protected] http://www.ee.columbia.edu/~dpwe/e4896/1
1. Pitch Tracking2. Spectral Approaches3. Time Domain4. Example algorithms
ELEN E4896 MUSIC SIGNAL PROCESSING
E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - /18
1. Pitch Tracking
• Pitch is a big part of hearingorthogonal to formants (vowels) in speechfollows fundamental frequency (f0) of soundsbecause of speech? because of periodicity?
• Pitch extraction (tracking) is usefulfor coding / representation (telephony)for extracting information
• .. but can be trickysounds without clear f0
2
E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - /18
Challenges to Pitch Extraction• Pitch extraction challenges include...
3
noise / multiple f0s
Voice Activity Detection
unclear f0
note segmentation
time / s
time / s
freq /
kHz
dBam
pl.
0.5 1 1.5 2 2.5 3 3.50
1
2
3
4
0 1000 2000 3000 4000-20
0
20
40
0.75 0.76 0.77 0.78 0.79
-0.2-0.1
00.10.2
3.3 3.31 3.32 3.33 3.34
-0.2-0.1
00.10.2
0 1000 2000 3000 4000-20
0
20
40
“Oh, I’m just about to lose my mind...”
freq / Hz
Talkin, 1995
E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - /18
Multiple f0s• Common in music (“polyphony”)
cross-terms...identify-remove-repeat?
4
Time
Frequency
0.5 1 1.5 2 2.5 3 3.50
1000
2000
3000
4000
Time
Frequency
0.5 1 1.5 2 2.5 3 3.50
1000
2000
3000
4000
E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - /18
2. Spectral Approaches• Just pick the most intense spectral peak?
fundamental may not be strongest...
5
freq
/ kHz
0
1
2
3
4
time / s
freq
/ kHz
0.5 1 1.5 2 2.5 3 3.50
1
2
3
4
E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - /18
Spectral Templates• “Filter” log-freq
spectrum with templatematchedfilter ...
6time / s0.5 1 1.5 2 2.5 3 3.5
1000200040008000
500250
1000200040008000
500250
0 20 40 60 80 100 120 140 160-60 -40 -20 0
0 20 40 60 80 100 120 140 160
=
*
E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - /18
Cepstrum• Spectrum
shows periodicity at f0reveal via Fourier transform of log|Y| = cepstrum
7
0 1000 2000 3000 4000-20
0
20
40
0 200 400 600 800 10000
1000
2000
3000
0 200 400 600 quefrency0
1000
2000
3000
0 1000 2000 3000 freq / Hz-20
0
20
40
20406080
100
freq
/ kHz
quef
renc
y
0
1
2
3
4
time / s0.5 1 1.5 2 2.5 3 3.5
Noll, 1967
E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - /18
“Specmurt”• Log-spectrum is like convolution of
pitch peaks and harmonics profileesp. for multiple notes by one instrument (piano)
• Separate out by deconvolutioninverse Fourier transform of log-f spectrum V(y)divide out harmonic template H(y)
8
Sagayama et al., 2004
E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - /18
Sinusoid Tracks
• Do sinusoid tracking before pitch trackingapply templates / sieve on tracksconsider each strong sinusoid as a candidatelimits to clearly periodic energy (normalized?)
9
Maher & Beauchamp 1994
time / s0.5 1 1.5 2 2.5 3 3.5
freq
/ kHz
0
1
2
3
4
freq
/ kHz
0
1
2
3
4
E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - /18
3. Time Domain Approaches• Periodicity is primarily time-domain
time-domain processing avoids FFT
• Early approaches: Look for common gap between wave features
10
time / s
ampl.
0.75 0.76 0.77 0.78 0.79
-0.2-0.1
00.10.2
3.3 3.31 3.32 3.33 3.34
-0.2-0.1
00.10.2
Gold & Rabiner 1969
E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - /18
Autocorrelation• Directly measures
period of self-similarity(autoco is inverse FT of magnitude spectrum)
11time / s0.5 1 1.5 2 2.5 3 3.5
freq
/ kHz
lag /
ms
lag /
ms
0
1
2
3
4
0
2
4
6
0
2
4
6
0.75 0.76 0.77 0.78
-0.2-0.1
00.10.2
3.3 3.31 3.32 3.33 time / s
0 0.01 0.02 0.03-10
0
10
20
0 0.01 0.02 0.03 lag / s
Sondhi 1967
E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - /18
Post-processing• Can assume & exploit pitch smoothness
• Median filteringonly knows best point
• Dynamic programming (Viterbi path)can look at all underlying values
12
0 0.5 1 1.5 2 2.5 3 3.50
2
4
6rawmedian31viterbi
time / s
lag /
ms
Talkin, 1995
E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - /18
4. Examples: YIN• Like autocorrelation, but
difference function - robust to changes in amplitudeprogressive normalization by cumulative average- eliminates lag = 0 advantageoutperforms others without post processing
13
0.5 1 1.5 2 2.5 3 3.5
-4
-3
-2
-1
0
Oct.
re: 4
40 H
z
0
0.5
sqrt(
apty)
0sqrt(
powe
r)
time / s
de Cheveigné & Kawahara, 2002
E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - /18
Examples: sigmund~• Miller Puckette’s Pd pitch tracker
• Sinusoids + sieve
• Also performs note segmentation
and polyphony
14
Puckette et al. 1998
L(f) =�
k
�
� + kY (k · f)
E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - /18
Examples: Weighted Sieve Polyphonic• Based on more complex multi-f0 system
• Chooses f0 via weighted sievespectrum is whitened before measuring peaksweights are optimized for training data
• Multiple f0s are found via cancel+repeat, or joint optimization
15
Klapuri 2006
E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - /18
Example: SAcC• Subband Autocorrelation Classification
multiple subbandsautocorrelation + PCAtrained MLP classifier to find pitch
16
Cochleafilterbank
Sound
short-timeautocorrelation
delay line
frequ
ency
chan
nels
Correlogramslice
freqlag
time
MLP Classifier
HMM Pitch
TrackerPitch
Lee & Ellis 2012
E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - /18
Summary• Pitch
Important perceptually, for speech & music
• SpectrumCombine multiple harmonics to f0
• TimeAutocorrelation directly measures periodicity
17
E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - /18
References• A. de Cheveigné & H. Kawahara, “YIN, a Fundamental Frequency Estimator for
Speech and Music,” J. Acoust. Soc. Am. 111, 1917–1930, 2002.
• B. Gold & L. Rabiner, “Parallel Processing Techniques for Estimating Pitch Periods of Speech in the Time Domain,” J. Acoust. Soc. Am. 46(2), 442-448, 1969.
• A. Klapuri, “Multiple Fundamental Frequency Estimation by Summing Harmonic Amplitudes,” Proc. Int. Symp. Music IR, 216-221, 2006.
• B. S. Lee & D. Ellis, “Noise Robust Pitch Tracking by Subband Autocorrelation Classification,” Proc. Interspeech-12, P3b.05, 2012.
• R. Maher & J. Beauchamp, “Fundamental frequency estimation of musical signals using a two-way mismatch procedure,” J. Acoust. Soc. Am. 95(4), 2254-2263,1994.
• A. Noll, “Cepstrum Pitch Determination,” J. Acoust. Soc. Am. 41, 293-309, 1967.
• M. Puckette, T. Apel, D. Zicarelli, “Real-time audio analysis tools for Pd and MSP,” Proc. Int. Comp. Music Conf., 109-112, 1998.
• S. Sagayama, K. Takahashi, H. Kameoka and T. Nishimoto, “Specmurt Anasylis: A Piano-Roll-Visualization of Polyphonic Music Signal by Deconvolution of Log-Frequency Spectrum,” Proc. SAPA-2004, 2004.
• M. Sondhi, “New Methods of Pitch Extraction,” IEEE Trans. Audio and Electroacoustics AU-16(2), 262-266, 1968.
• D. Talkin, “A robust algorithm for pitch tracking (RAPT),” in Speech Coding and Synthesis (W. B. Kleijn and K. K. Paliwal, eds.), 495-518, Elsevier, 1995.
18