[advanced] speech & audio signal processing

13
[Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006

Upload: axelle

Post on 21-Jan-2016

128 views

Category:

Documents


7 download

DESCRIPTION

[Advanced] Speech & Audio Signal Processing. ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006. State of the Art in Speech/Audio. Speech and audio processing may be divided into “low-level” and “high-level” inference - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: [Advanced] Speech & Audio Signal Processing

[Advanced] Speech & Audio Signal Processing

ES 157/257: Speech and Audio ProcessingProf. Patrick Wolfe, Harvard DEAS

02 February 2006

Page 2: [Advanced] Speech & Audio Signal Processing

State of the Art in Speech/Audio

Speech and audio processing may be divided into “low-level” and “high-level” inference Speech enhancement, compression, and

coding are all widely used technologies This low-level work is the most mature

High-level tasks will drive future advances Speech/music database information retrieval Automatic speaker and speech recognition

But low-level issues also remain…

Page 3: [Advanced] Speech & Audio Signal Processing

Fundamental Questions

How to obtain highly structured representations of speech and audio signals? Time frequency “atoms”

as building blocks How can statistical inference

enable advances in speech signal processing? A means to obtain an

“atomic decomposition” Statistical modeling of time-

frequency coefficients provides a principled solution

Page 4: [Advanced] Speech & Audio Signal Processing

Representative Applications

Missing data in the context of VOIP: Original Missing Restored

Source / Speaker Separation Source 1 Source 2

Mixture 1 Mixture 2

Recovery 1 Recovery 2

Page 5: [Advanced] Speech & Audio Signal Processing

Digital Speech/Audio Processing

Page 6: [Advanced] Speech & Audio Signal Processing

Speech Production

Page 7: [Advanced] Speech & Audio Signal Processing

Time-Scale Modification

Page 8: [Advanced] Speech & Audio Signal Processing

Time-Scale Modification

Male & Female Speaker Original Fast Faster Slower

Trumpet Original Fast Slow

Speech and Quasi-Periodic Audio Sinewave-based Modification Voicing-dependent Rate Factor

Page 9: [Advanced] Speech & Audio Signal Processing

More Time-Scale Modification

Falling Can, Bongo Drums, Loon Original Slow

Complex Non-Speech Signals Phase-Vocoder-based Modification Event-Dependent Phase Coherence

Page 10: [Advanced] Speech & Audio Signal Processing

Pitch and Vocal Tract Change

Male & Female Speaker Original Low pitch/Long vocal

tract High pitch/Short vocal

tract

Male Speaker Original and Monotone

Sinewave-based Modification

Page 11: [Advanced] Speech & Audio Signal Processing

Speech Coding

Female Speaker Original CELP 8000 bps Sine 4800 bps Sine 2400 bps

Sinewave-based Code-Excited Linear Prediction

Male Speaker Original CELP 8000 bps Sine 4800 bps Sine 2400 bps

Page 12: [Advanced] Speech & Audio Signal Processing

Noise Reduction

Cell Phone Noise, Cocktail Party, Automobile Noise Original Enhanced

Adaptive Wiener Filter Adaptation Based on Spectral Change

Page 13: [Advanced] Speech & Audio Signal Processing

Compression

Low-noise case Original 1.5 dB Reduction 3.0 dB Reduction

Reduction of Peak-to-RMS amplitude ratio Based on Sinewave Analysis/Synthesis

High-noise case Original 1.5 dB Reduction 3.0 dB Reduction