advanced speech enhancement in noisy environments qiming zhu supervisor: prof. john soraghan centre...

20
Advanced Speech Enhancement in Noisy Environments Qiming Zhu Supervisor: Prof. John Soraghan Centre for excellence in Signal and Image Processing Dept Electronic and Electrical Engineering [email protected]

Upload: colten-henson

Post on 14-Dec-2015

220 views

Category:

Documents


3 download

TRANSCRIPT

  • Slide 1

Slide 2 Advanced Speech Enhancement in Noisy Environments Qiming Zhu Supervisor: Prof. John Soraghan Centre for excellence in Signal and Image Processing Dept Electronic and Electrical Engineering [email protected] Slide 3 Introduction Speech Enhancement Improved Minima Controlled Recursive Averaging (IMCRA) Robust Voice Activity Detection (VAD) 1-D Local Binary Pattern (LBP) 1-D LBP of energy based VAD Performance Evaluation Improved IMCRA Performance Evaluation Discussion & Conclusion Presentation structure Slide 4 Automatic speech recognition (ASR) Speech recognition system aims to create intelligent machines that can hear, understand and comply to speech input. Speech enhancement and VAD are applied as the integral parts in ASR system. Aim of current research Improve the recognition system performance in babble noisy background. Introduction Slide 5 IMCRA: IMCRA * Israel Cohen, Noise spectrum estimation in adverse environments: improved minima controlled recursive averaging. (IEEE Tran. On speech and audio, 2003) IMCRA Processing Slide 6 IMCRA Performance Clean Signal: Noisy Signal at 0 dB: Enhanced by IMCRA: IMCRA with babble Slide 7 1-D LBP Slide 8 1-D LBP calculate the LBP code after thresholding the neighbour samples. 1-D LBP code LBP code calculation for p=8 *Navin Chatlani et al, Local binary patterns for 1-D signal processing, (EUSIPCO 2010) Slide 9 1-D LBP histogram Overview of 1-D LBP procedure on a histogram 1-D LBP perform the Histogram with the window data Slide 10 1-D LBP of energy Short-time energy and the histogram Speech Signals and the Short-time Energy a) energy of clean speech signal, b) energy of noisy speech signal, c) histogram of clean speech energy, d) histogram of noisy speech energy. Slide 11 1-D LBP of energy with offset value Slide 12 System block diagram 1-D LBP of energy based VAD VAD block diagram Slide 13 VAD performance Slide 14 Slide 15 Experimental background 198 samples from VoxForge database, includes 9 people: 6 males and 3 females. Sampling frequency at 16 kHz. Babble noise from NOISEX-92 Database added at SNR from -10 dB to 10 dB. Energy widow size set to be 5 ms, p=2, histogram size set to be 30 ms. Segmental SNR and weighted spectrum slope (WSS) are used to compare the performance. Improved IMCRA *Klatt et al, Prediction in perceived phonetic distance from critical band spectra, IEEE Conference on Acoustics, 1982 Slide 16 Performance Clean signal: Noisy signal ( SNR at 0 dB): IMCRA: Improved IMCRA: Improved IMCRA with babble noise Slide 17 Performance Improved IMCRA with babble noise Segmental SNR Slide 18 Performance Improved IMCRA with babble noise Weighted spectrum slope Slide 19 Conclusion for the results 1-D LBP in energy domain can distinguish the voiced and unvoiced components of noisy speech signals. LBP in energy domain is shown to be superior to the G.729 VAD and Navins LBP VAD. Improved IMCRA is superior to IMCRA with enhanced segmental SNR and higher likelihood. Future work Applied this algorithm as the pre-processing of a ASR system. Discussion Slide 20 Thank Prof. John Soraghan for the idea of babble noise reduction. Thank Paul and Navin for the previous work on 1-D LBP. Acknowledge Slide 21 Thank you! Any Question?