automatic pitch tracking september 18, 2014 the digitization of pitch the blue line represents the...

34
Automatic Pitch Tracking September 18, 2014

Upload: meryl-mills

Post on 11-Jan-2016

222 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

Automatic Pitch Tracking

September 18, 2014

Page 2: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

The Digitization of Pitch

• The blue line represents the fundamental frequency (F0) of the speaker’s voice.

• Also known as a pitch track

• How can we automatically “track” F0 in a sample of speech?

• Praat can give us a representation of speech that looks like:

Page 3: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

Pitch Tracking• Voicing:

• Air flow through vocal folds

• Rapid opening and closing due to Bernoulli Effect

• Each cycle sends an acoustic shockwave through the vocal tract

• …which takes the form of a complex wave.

• The rate at which the vocal folds open and close becomes the fundamental frequency (F0) of a voiced sound.

Page 4: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

Voicing Bars

Page 5: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

Voicing Bars

Individual glottal pulses

Page 6: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

Voicing = Complex Wave

• Note: voicing is not perfectly periodic.

• …always some random variation from one cycle to the next.

• How can we measure the fundamental frequency of a complex wave?

Page 7: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

• The basic idea: figure out the period between successive cycles of the complex wave.

• Fundamental frequency = 1 / period

duration = ???

Page 8: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

Measuring F0• To figure out where one cycle ends and the next

begins…

• The basic idea is to find how well successive “chunks” of a waveform match up with each other.

• One period = the length of the chunk that matches up best with the next chunk.

• Automatic Pitch Tracking parameters to think about:

1. Window size (i.e., chunk size)

2. Step size

3. Frequency range (= period range)

Page 9: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

Window (Chunk) Size

Here’s an example of a small window

Page 10: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

Window (Chunk) Size

Here’s an example of a large(r) window

Page 11: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

Initial window of the waveform is compared to another window (of the same duration) at a later point in the waveform

Page 12: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

Matching

The waveforms in the two windows are compared to see how well they match up.

Correlation = measure of how well the two windows match

???

Page 13: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

Autocorrelation• The measure of correlation =

• Sum of the point-by-point products of the two chunks.

• The technical name for this is autocorrelation…

• because two parts of the same wave are being matched up against each other.

• (“auto” = self)

Page 14: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

Autocorrelation Example• Ex: consider window x, with n samples…

• What’s its correlation with window y?

• (Note: window y must also have n samples)

• x1 = first sample of window x

• x2 = second sample of window x

• …

• xn = nth (final) sample of window x

• y1 = first sample of window y, etc.

• Correlation (R) = x1*y1 + x2* y2 + … + xn* yn

• The larger R is, the better the correlation.

Page 15: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

By the NumbersSample 1 2 3 4 5 6

x .8 .3 -.2 -.5 .4 .8

y -.3 -.1 .1 .3 .1 -.1

product -.24 -.03 -.02 -.15 .04 -.08

Sum of products = -.48

• These two chunks are poorly correlated with each other.

Page 16: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

By the Numbers, part 2Sample 1 2 3 4 5 6

x .8 .3 -.2 -.5 .4 .8

z .7 .4 -.1 -.4 .1 .4

product .56 .12 .02 .2 .04 .32

Sum of products = 1.26

• These two chunks are well correlated with each other.

(or at least better than the previous pair)

• Note: matching peaks count for more than matches close to 0.

Page 17: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

Back to (Digital) Reality

The waveforms in the two windows are compared to see how well they match up.

Correlation = measure of how well the two windows match

???

These two windows are poorly correlated

Page 18: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

Next: the pitch tracking algorithm moves further down the waveform and grabs a new window

Page 19: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

The distance the algorithm moves forward in the waveform is called the step size

“step”

Page 20: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

Matching, again

The next window gets compared to the original.

???

Page 21: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

Matching, again

The next window gets compared to the original.

???

These two windows are also poorly correlated

Page 22: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

The algorithm keeps chugging and, eventually…

another “step”

Page 23: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

Matching, again

The best match is found.

???

These two windows are highly correlated

Page 24: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

The fundamental period can be determined by calculating the length of time between the start of window 1 and the start of (well correlated) window 2.

period

Page 25: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

period

• Frequency is 1 / period

• Q: How many possible periods does the algorithm need to check?

• Frequency range (default in Praat: 75 to 600 Hz)

Mopping up

Page 26: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

Moving on

• Another comparison window is selected and the whole process starts over again.

Page 27: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

*

**********************

*******************

*************

****** ********************

************* ************** ***********************

**********************

*********** ****************** *******

****************

F0 (Hz)

1 2 3 4 (s)

200300400

Time

would

Uhm

I

like

A flight to Seattle from Albuquerque

• The algorithm ultimately spits out a pitch track.

• This one shows you the F0 value at each step.

Thanks to Chilin Shih for making these materials available

Page 28: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

Pitch Tracking in Praat• Play with F0 range.

• Create Pitch Object.

• Also go To Manipulation…Pitch.

• Also check out:

Page 29: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

Summing Up• Pitch tracking uses three parameters

1. Window size

• Ensures reliability

• In Praat, the window size is always three times the longest possible period.

• E.g.: 3 X 1/75 = .04 sec.

2. Step size

• For temporal precision

3. Frequency range

• Reduces computational load

Page 30: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

Deep Thought Questions• What might happen if:

• The shortest period checked is longer than the fundamental period?

• AND two fundamental periods fit inside a window?

• Potential Problem #1: Pitch Halving

• The pitch tracker thinks the fundamental period is twice as long as it is in reality.

• It estimates F0 to be half of its actual value

Page 31: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

Pitch Halving

pitch is halvedCheck out normal file in Praat.

Page 32: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

More Deep Thoughts• What might happen if:

• The shortest period checked is less than half of the fundamental period?

• AND the second half of the fundamental cycle is very similar to the first?

• Potential Problem #2: Pitch doubling

• The pitch tracker thinks the fundamental period is half as long as it actually is.

• It estimates the F0 to be twice as high as it is in reality.

Page 33: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

Pitch Doubling

pitch is doubled

Page 34: Automatic Pitch Tracking September 18, 2014 The Digitization of Pitch The blue line represents the fundamental frequency (F0) of the speaker’s voice

Microperturbations• Another problem:

• Speech waveforms are partly shaped by the type of segment being produced.

• Pitch tracking can become erratic at the juncture of two segments.

• In particular:

• voiced to voiceless segments

• sonorants to obstruents

• These discontinuities in F0 are known as microperturbations.

• Also: transitions between modal and creaky voicing tend to be problematic.