elen e4896 music signal processing lecture …dpwe/e4896/lectures/e4896-l...e4896 music signal...

E4896 Music Signal Processing (Dan Ellis) 2013-03-11 - /18

Lecture 8:Pitch Tracking

Dan EllisDept. Electrical Engineering, Columbia University

[email protected] http://www.ee.columbia.edu/~dpwe/e4896/1

1. Pitch Tracking2. Spectral Approaches3. Time Domain4. Example algorithms

ELEN E4896 MUSIC SIGNAL PROCESSING

mailto:[email protected]

mailto:[email protected]

http://labrosa.ee.columbia.edu

http://labrosa.ee.columbia.edu


1. Pitch Tracking

• Pitch is a big part of hearingorthogonal to formants (vowels) in speechfollows fundamental frequency (f0) of soundsbecause of speech? because of periodicity?

• Pitch extraction (tracking) is usefulfor coding / representation (telephony)for extracting information

• .. but can be trickysounds without clear f0

2


Challenges to Pitch Extraction• Pitch extraction challenges include...

3

noise / multiple f0s

Voice Activity Detection

unclear f0

note segmentation

time / s

time / s

freq /

kHz

dBam

pl.

0.5 1 1.5 2 2.5 3 3.50

1

2

3

4

0 1000 2000 3000 4000-20

0

20

40

0.75 0.76 0.77 0.78 0.79

-0.2-0.1

00.10.2

3.3 3.31 3.32 3.33 3.34

-0.2-0.1

00.10.2

0 1000 2000 3000 4000-20

0

20

40

“Oh, I’m just about to lose my mind...”

freq / Hz

Talkin, 1995


Multiple f0s• Common in music (“polyphony”)

cross-terms...identify-remove-repeat?

4

Time

Frequency

0.5 1 1.5 2 2.5 3 3.50

1000

2000

3000

4000

Time

Frequency

0.5 1 1.5 2 2.5 3 3.50

1000

2000

3000

4000


2. Spectral Approaches• Just pick the most intense spectral peak?

fundamental may not be strongest...

5

freq

/ kHz

0

1

2

3

4

time / s

freq

/ kHz

0.5 1 1.5 2 2.5 3 3.50

1

2

3

4


Spectral Templates• “Filter” log-freq

spectrum with templatematchedfilter ...

6time / s0.5 1 1.5 2 2.5 3 3.5

1000200040008000

500250

1000200040008000

500250

0 20 40 60 80 100 120 140 160-60 -40 -20 0

0 20 40 60 80 100 120 140 160

=

*


Cepstrum• Spectrum

shows periodicity at f0reveal via Fourier transform of log|Y| = cepstrum

7

0 1000 2000 3000 4000-20

0

20

40

0 200 400 600 800 10000

1000

2000

3000

0 200 400 600 quefrency0

1000

2000

3000

0 1000 2000 3000 freq / Hz-20

0

20

40

20406080

100

freq

/ kHz

quef

renc

y

0

1

2

3

4

time / s0.5 1 1.5 2 2.5 3 3.5

Noll, 1967


“Specmurt”• Log-spectrum is like convolution of

pitch peaks and harmonics profileesp. for multiple notes by one instrument (piano)

• Separate out by deconvolutioninverse Fourier transform of log-f spectrum V(y)divide out harmonic template H(y)

8

Sagayama et al., 2004


Sinusoid Tracks

• Do sinusoid tracking before pitch trackingapply templates / sieve on tracksconsider each strong sinusoid as a candidatelimits to clearly periodic energy (normalized?)

9

Maher & Beauchamp 1994

time / s0.5 1 1.5 2 2.5 3 3.5

freq

/ kHz

0

1

2

3

4

freq

/ kHz

0

1

2

3

4


3. Time Domain Approaches• Periodicity is primarily time-domain

time-domain processing avoids FFT

• Early approaches: Look for common gap between wave features

10

time / s

ampl.

0.75 0.76 0.77 0.78 0.79

-0.2-0.1

00.10.2

3.3 3.31 3.32 3.33 3.34

-0.2-0.1

00.10.2

Gold & Rabiner 1969


Autocorrelation• Directly measures

period of self-similarity(autoco is inverse FT of magnitude spectrum)

11time / s0.5 1 1.5 2 2.5 3 3.5

freq

/ kHz

lag /

ms

lag /

ms

0

1

2

3

4

0

2

4

6

0

2

4

6

0.75 0.76 0.77 0.78

-0.2-0.1

00.10.2

3.3 3.31 3.32 3.33 time / s

0 0.01 0.02 0.03-10

0

10

20

0 0.01 0.02 0.03 lag / s

Sondhi 1967


Post-processing• Can assume & exploit pitch smoothness

• Median filteringonly knows best point

• Dynamic programming (Viterbi path)can look at all underlying values

12

0 0.5 1 1.5 2 2.5 3 3.50

2

4

6rawmedian31viterbi

time / s

lag /

ms

Talkin, 1995


4. Examples: YIN• Like autocorrelation, but

difference function - robust to changes in amplitudeprogressive normalization by cumulative average- eliminates lag = 0 advantageoutperforms others without post processing

13

0.5 1 1.5 2 2.5 3 3.5

-4

-3

-2

-1

0

Oct.

re: 4

40 H

z

0

0.5

sqrt(

apty)

0sqrt(

powe

r)

time / s

de Cheveigné & Kawahara, 2002


Examples: sigmund~• Miller Puckette’s Pd pitch tracker

• Sinusoids + sieve

• Also performs note segmentation

and polyphony

14

Puckette et al. 1998

L(f) =�

k

�

� + kY (k · f)


Examples: Weighted Sieve Polyphonic• Based on more complex multi-f0 system

• Chooses f0 via weighted sievespectrum is whitened before measuring peaksweights are optimized for training data

• Multiple f0s are found via cancel+repeat, or joint optimization

15

Klapuri 2006


Example: SAcC• Subband Autocorrelation Classification

multiple subbandsautocorrelation + PCAtrained MLP classifier to find pitch

16

Cochleafilterbank

Sound

short-timeautocorrelation

delay line

frequ

ency

chan

nels

Correlogramslice

freqlag

time

MLP Classifier

HMM Pitch

TrackerPitch

Lee & Ellis 2012


Summary• Pitch

Important perceptually, for speech & music

• SpectrumCombine multiple harmonics to f0

• TimeAutocorrelation directly measures periodicity

17


References• A. de Cheveigné & H. Kawahara, “YIN, a Fundamental Frequency Estimator for

Speech and Music,” J. Acoust. Soc. Am. 111, 1917–1930, 2002.

• B. Gold & L. Rabiner, “Parallel Processing Techniques for Estimating Pitch Periods of Speech in the Time Domain,” J. Acoust. Soc. Am. 46(2), 442-448, 1969.

• A. Klapuri, “Multiple Fundamental Frequency Estimation by Summing Harmonic Amplitudes,” Proc. Int. Symp. Music IR, 216-221, 2006.

• B. S. Lee & D. Ellis, “Noise Robust Pitch Tracking by Subband Autocorrelation Classification,” Proc. Interspeech-12, P3b.05, 2012.

• R. Maher & J. Beauchamp, “Fundamental frequency estimation of musical signals using a two-way mismatch procedure,” J. Acoust. Soc. Am. 95(4), 2254-2263,1994.

• A. Noll, “Cepstrum Pitch Determination,” J. Acoust. Soc. Am. 41, 293-309, 1967.

• M. Puckette, T. Apel, D. Zicarelli, “Real-time audio analysis tools for Pd and MSP,” Proc. Int. Comp. Music Conf., 109-112, 1998.

• S. Sagayama, K. Takahashi, H. Kameoka and T. Nishimoto, “Specmurt Anasylis: A Piano-Roll-Visualization of Polyphonic Music Signal by Deconvolution of Log-Frequency Spectrum,” Proc. SAPA-2004, 2004.

• M. Sondhi, “New Methods of Pitch Extraction,” IEEE Trans. Audio and Electroacoustics AU-16(2), 262-266, 1968.

• D. Talkin, “A robust algorithm for pitch tracking (RAPT),” in Speech Coding and Synthesis (W. B. Kleijn and K. K. Paliwal, eds.), 495-518, Elsevier, 1995.

18

http://www.ee.columbia.edu/%7Edpwe/papers/Klap06-multif0.pdf




http://www.ee.columbia.edu/~dpwe/pubs/LeeEllis12-SAcC.pdf




elen e4896 music signal processing lecture …dpwe/e4896/lectures/e4896-l...e4896 music signal...

Documents