speech in nips 2019/2020 - tsinghua university
TRANSCRIPT
Speech in NIPS 2019/2020
Lantian Li
2020-12-21
Untangling in Invariant Speech Recognition
• How information is untangled within DNNs trained to recognize speech.
• Define several metrics (manifold capacity) which connecting geometric properties of network representations and the separability of classes.
• A theory-driven geometric analysis of representation untangling in tasks.
• CNN, Deep Speech 2
• WSJ, Librispeech
Anchor points (support vectors)
Manifold capacity measures
• Mean-Field Theoretic (MFT) Manifold Capacity
FastSpeech: Fast, Robust and Controllable Text to Speech
• Neural TTS suffers from slow inference speech, lack of robustness (word skipping or repeating) and uncontrollability (speed or prosody).
• Using a feed-forward Transformer (instead of conventional encoder-attention-decoder framework) to generate Mel-spectrogram in parallel.
• Using a length regulator to expand the phoneme sequence to match the length of the target Mel-spectrogram sequence.
FastSpeech
FastSpeech 2
Length regulator
Controllability
Robustness
https://speechresearch.github.io/fastspeech/
MelGAN: Generative Adversarial Networks for Conditional Waveform Synthesis
HiFi-GAN: Generative Adversarial Networks for Efficient and High Fidelity Speech Synthesis
Listening to Sounds of Silence for Speech Denoising• A silent interval reveals noise characteristics.
• Several silent intervals assemble a time-varying noise distribution.
• Silent Interval Detection, Noise Estimation, Noise Removal
Loss functions and training
Silent interval supervision
Data construction
• AVSPEECH: audio-video speech
• 2214 videos for training and 234 videos for testing
• DEMAND and Google’s AudioSet
• The SNRs range in [-10dB, 10dB]
Performance of SID
Ablation studies
Comparison with SOTA
Wav2vec 2.0: A Framework for Self-Supervised Learning of Speech Representations
• Exploring wav2vec 2.0 on speaker verification and language identification• https://arxiv.org/abs/2012.06185
The Cone of Silence: Speech Separation by Localization
https://grail.cs.washington.edu/projects/cone-of-silence/