[advanced] speech & audio signal processing es 157/257: speech and audio processing prof....
TRANSCRIPT
![Page 1: [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f425503460f94c61dad/html5/thumbnails/1.jpg)
[Advanced] Speech & Audio Signal Processing
ES 157/257: Speech and Audio ProcessingProf. Patrick Wolfe, Harvard DEAS
02 February 2006
![Page 2: [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f425503460f94c61dad/html5/thumbnails/2.jpg)
State of the Art in Speech/Audio
Speech and audio processing may be divided into “low-level” and “high-level” inference Speech enhancement, compression, and
coding are all widely used technologies This low-level work is the most mature
High-level tasks will drive future advances Speech/music database information retrieval Automatic speaker and speech recognition
But low-level issues also remain…
![Page 3: [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f425503460f94c61dad/html5/thumbnails/3.jpg)
Fundamental Questions
How to obtain highly structured representations of speech and audio signals? Time frequency “atoms”
as building blocks How can statistical inference
enable advances in speech signal processing? A means to obtain an
“atomic decomposition” Statistical modeling of time-
frequency coefficients provides a principled solution
![Page 4: [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f425503460f94c61dad/html5/thumbnails/4.jpg)
Representative Applications
Missing data in the context of VOIP: Original Missing Restored
Source / Speaker Separation Source 1 Source 2
Mixture 1 Mixture 2
Recovery 1 Recovery 2
![Page 5: [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f425503460f94c61dad/html5/thumbnails/5.jpg)
Digital Speech/Audio Processing
![Page 6: [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f425503460f94c61dad/html5/thumbnails/6.jpg)
Speech Production
![Page 7: [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f425503460f94c61dad/html5/thumbnails/7.jpg)
Time-Scale Modification
![Page 8: [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f425503460f94c61dad/html5/thumbnails/8.jpg)
Time-Scale Modification
Male & Female Speaker Original Fast Faster Slower
Trumpet Original Fast Slow
Speech and Quasi-Periodic Audio Sinewave-based Modification Voicing-dependent Rate Factor
![Page 9: [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f425503460f94c61dad/html5/thumbnails/9.jpg)
More Time-Scale Modification
Falling Can, Bongo Drums, Loon Original Slow
Complex Non-Speech Signals Phase-Vocoder-based Modification Event-Dependent Phase Coherence
![Page 10: [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f425503460f94c61dad/html5/thumbnails/10.jpg)
Pitch and Vocal Tract Change
Male & Female Speaker Original Low pitch/Long vocal
tract High pitch/Short vocal
tract
Male Speaker Original and Monotone
Sinewave-based Modification
![Page 11: [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f425503460f94c61dad/html5/thumbnails/11.jpg)
Speech Coding
Female Speaker Original CELP 8000 bps Sine 4800 bps Sine 2400 bps
Sinewave-based Code-Excited Linear Prediction
Male Speaker Original CELP 8000 bps Sine 4800 bps Sine 2400 bps
![Page 12: [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f425503460f94c61dad/html5/thumbnails/12.jpg)
Noise Reduction
Cell Phone Noise, Cocktail Party, Automobile Noise Original Enhanced
Adaptive Wiener Filter Adaptation Based on Spectral Change
![Page 13: [Advanced] Speech & Audio Signal Processing ES 157/257: Speech and Audio Processing Prof. Patrick Wolfe, Harvard DEAS 02 February 2006](https://reader036.vdocuments.mx/reader036/viewer/2022062304/56649f425503460f94c61dad/html5/thumbnails/13.jpg)
Compression
Low-noise case Original 1.5 dB Reduction 3.0 dB Reduction
Reduction of Peak-to-RMS amplitude ratio Based on Sinewave Analysis/Synthesis
High-noise case Original 1.5 dB Reduction 3.0 dB Reduction