combining auditory preprocessing and bayesian estimation ...swoh.web.engr.illinois.edu › ... ›...

15
Kanru Hua (IE598 Final Presentation) 1 Combining Auditory Preprocessing and Bayesian Estimation for Robust Formant Tracking (Gläser et al., 2010)

Upload: others

Post on 25-Jun-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Combining Auditory Preprocessing and Bayesian Estimation ...swoh.web.engr.illinois.edu › ... › fall2016_slide8.pdf · Gläser, Claudius, et al. "Combining Auditory Preprocessing

Kanru Hua (IE598 Final Presentation) 1

Combining Auditory Preprocessing and Bayesian Estimation for Robust Formant Tracking

(Gläser et al., 2010)

Page 2: Combining Auditory Preprocessing and Bayesian Estimation ...swoh.web.engr.illinois.edu › ... › fall2016_slide8.pdf · Gläser, Claudius, et al. "Combining Auditory Preprocessing

Kanru Hua (IE598 Final Presentation) 2

Background

● In speech processing context, formants are resonances of the vocal tract.

● Formant frequencies have a close link to vowel quality.

● Applications: speech recognition/synthesis, speech enhancement, hearing aids, language learning tools, ...

time

freq

uenc

y

“Author of the danger trail, ...”

Page 3: Combining Auditory Preprocessing and Bayesian Estimation ...swoh.web.engr.illinois.edu › ... › fall2016_slide8.pdf · Gläser, Claudius, et al. "Combining Auditory Preprocessing

Kanru Hua (IE598 Final Presentation) 3

Architecture (simplified)

Auditory Filterbank Gender Detection

Enhancement

Bayesian MixtureFiltering

Bayesian Smoothing Bayesian Smoothing Bayesian Smoothing...

Speech Signal

Adaptive FrequencyRange Segmentation

F1 F2 FN

Page 4: Combining Auditory Preprocessing and Bayesian Estimation ...swoh.web.engr.illinois.edu › ... › fall2016_slide8.pdf · Gläser, Claudius, et al. "Combining Auditory Preprocessing

Kanru Hua (IE598 Final Presentation) 4

Bayesian Filtering

● Think of a generalized version of Kalman Filter

● Define belief/message as posterior probability

y1 y2 y3 y4

x1 x2 x3 x4

Observation(filterbank output)

Hidden State(formant freqs.)

(predict)

(update)

Page 5: Combining Auditory Preprocessing and Bayesian Estimation ...swoh.web.engr.illinois.edu › ... › fall2016_slide8.pdf · Gläser, Claudius, et al. "Combining Auditory Preprocessing

Kanru Hua (IE598 Final Presentation) 5

Bayesian Filtering

● Formants are not normally distributed - Kalman filter won’t work

● Particle filtering (non-parametric) – multi-modality not guaranteed

– To illustrate why, let’s suppose

● This leads us to mixture filtering, a techinque borrowed from the computer vision community.

Page 6: Combining Auditory Preprocessing and Bayesian Estimation ...swoh.web.engr.illinois.edu › ... › fall2016_slide8.pdf · Gläser, Claudius, et al. "Combining Auditory Preprocessing

Kanru Hua (IE598 Final Presentation) 6

Bayesian Mixture Filtering

To find the weights:

(Vermaak et al., 2003)

(each corresponds to a formant)

Page 7: Combining Auditory Preprocessing and Bayesian Estimation ...swoh.web.engr.illinois.edu › ... › fall2016_slide8.pdf · Gläser, Claudius, et al. "Combining Auditory Preprocessing

Kanru Hua (IE598 Final Presentation) 7

Bayesian Mixture FilteringA quick summary:

Propagation on each component (formant) is independent from the others

Re-weighting step is the only place where mixture components interact

Target belief and component beliefs(at time t)

Page 8: Combining Auditory Preprocessing and Bayesian Estimation ...swoh.web.engr.illinois.edu › ... › fall2016_slide8.pdf · Gläser, Claudius, et al. "Combining Auditory Preprocessing

Kanru Hua (IE598 Final Presentation) 8

Mixture Segmentation

● However, mixture filtering still does not prevent belief diffusion.

(mixture components could become over-general over time as they independently propagates; order of formants is unconstrained)

● Solution: introducing hard frequency boundaries R1, R2, … RM between formants/mixture components

Page 9: Combining Auditory Preprocessing and Bayesian Estimation ...swoh.web.engr.illinois.edu › ... › fall2016_slide8.pdf · Gläser, Claudius, et al. "Combining Auditory Preprocessing

Kanru Hua (IE598 Final Presentation) 9

Mixture Segmentation

● We need to modify & re-weight component beliefs to implement these hard boundaries

● Concretely, set out-of-range probabilities to zero while making sure the target distribution is kept unchanged

(accumulation)

(truncation & re-weighting)

Page 10: Combining Auditory Preprocessing and Bayesian Estimation ...swoh.web.engr.illinois.edu › ... › fall2016_slide8.pdf · Gläser, Claudius, et al. "Combining Auditory Preprocessing

Kanru Hua (IE598 Final Presentation) 10

Adaptive Segmentation

● The next step is to determine R1, R2, … RM, given component beliefs before segmentation

● We run a Viterbi search (in other words, dynamic programming) to find out the most likely segmentation

– State space: assignment from (discretized) frequency to mixture component

Other transitions have zero probability (disabled)

Page 11: Combining Auditory Preprocessing and Bayesian Estimation ...swoh.web.engr.illinois.edu › ... › fall2016_slide8.pdf · Gläser, Claudius, et al. "Combining Auditory Preprocessing

Kanru Hua (IE598 Final Presentation) 11

Bayesian Smoothing

● So far we are making predictions based on previous observations (y1, y2, …, yt) only. The re-weighted beliefs may still appear ambiguous.

● We mitigate this by incorporating observations from the reverse direction (if the whole sequence is known in advance), in a fashion similar to the backward pass in Kalman filtering.

Page 12: Combining Auditory Preprocessing and Bayesian Estimation ...swoh.web.engr.illinois.edu › ... › fall2016_slide8.pdf · Gläser, Claudius, et al. "Combining Auditory Preprocessing

Kanru Hua (IE598 Final Presentation) 12

Results

● Final formant frequency estimate:

● Evaluation:

– Tested on 34 and 56 sentences spoken by male and female speakers, respectively

– Added white/babble/car noise at 7 different signal-to-noise ratios

Page 13: Combining Auditory Preprocessing and Bayesian Estimation ...swoh.web.engr.illinois.edu › ... › fall2016_slide8.pdf · Gläser, Claudius, et al. "Combining Auditory Preprocessing

Kanru Hua (IE598 Final Presentation) 13

Results

● Compared against other formant tracking approaches

(percentage of error reduction)

● Time delay for real-time tracking: 120ms on Intel Q6600 @ 2.4GHz

Page 14: Combining Auditory Preprocessing and Bayesian Estimation ...swoh.web.engr.illinois.edu › ... › fall2016_slide8.pdf · Gläser, Claudius, et al. "Combining Auditory Preprocessing

Kanru Hua (IE598 Final Presentation) 14

Summary

● A two-stage formant tracking method– First stage: signal processing for feature extraction– Second stage: Bayesian filtering on features

● Challenge 1 – maintaining multi-modality– Solution: mixture tracking

● Challenge 2 - belief diffusion– Solution: adaptive frequency range segmentation

● Post processing: Bayesian smoothing (backward pass)● Drawbacks:

– Computationally expensive– Inevitable time delay for real-time tracking– Formant continuity not guaranteed

Page 15: Combining Auditory Preprocessing and Bayesian Estimation ...swoh.web.engr.illinois.edu › ... › fall2016_slide8.pdf · Gläser, Claudius, et al. "Combining Auditory Preprocessing

Kanru Hua (IE598 Final Presentation) 15

References

● Gläser, Claudius, et al. "Combining Auditory Preprocessing and Bayesian Estimation for Robust Formant Tracking." IEEE Transactions on Audio, Speech & Language Processing 18.2 (2010): 224-236.

● Vermaak, Jaco, Arnaud Doucet, and Patrick Pérez. "Maintaining multimodality through mixture tracking." Computer Vision, 2003. Proceedings. Ninth IEEE International Conference on. IEEE, 2003.