audio/speech signal processing an overview - iit...
TRANSCRIPT
Audio/Speech Signal Processing
An Overview
Application Fields
Sound Mixer: Music Recording
Audio Processor: FM Broadcasting
Synthesizer: Sound Synthesis
Voice call: Noise reduction and Speech Codecs
Signal Processing Tasks
• Audio/Speech Encoding/Decoding - Codecs
( DFT – Spectral Analysis, Filtering & Modifications)
• Audio effects( FIR/IIR - Digital Filtering & Spectral Modifications)
Audio/Speech Codecs
Voice Call flow through mobile
Echo CancellationNoise Reduction
Speech Codec
Approximate data transfer size for 60 sec Call
Raw Data: (Just analog to digital converted data)
Sampling rate: 8000 samples/secStorage space for one sample : 8bit
Total data size = Number of samples * Storage space for one sample = Samples/sec * Number of seconds * Storage space
= 8000 * 60 * 8 bits = 3840 Kbits
Bit rate = Samples/sec * Storage space for one sample = 64 Kbits/sec
Encoded/Compressed data: (DSP algorithm over sampled digital data)
Bit rate = 6.5 to 13 Kbits/sec (GSM Speech codecs output)
Data size = Transferred bits/sec * Number of seconds
= Bit rate * Number of seconds = 6.5 (13.5) * 60 = 390 to 810 Kbits
Audio Quality Measure
Audio 1
Audio 2
Audio 3
Raw Audio1441Kbps
Compressed audio at 128Kbps
Compressed audio at 32Kbps
Signal Compression in Frequency domain
Audio/Speech Codecs
Spectrogram : Frequency variation with time
Time
Frequency
128 Kbits MP3 Encoded Audio
32 Kbits MP3 Encoded Audio
1411 Kbits Raw Audio
Frequency
Frequency
Audio and Speech Codecs
Audio Frequency Range: 20Hz – 20KHz
Speech Frequency Range: 300Hz – 3500Hz
Speech Codecs: (Linear Prediction approach)
AMR, G.723
bitrate: 1.2 Kbits/sec
Sampling rate: 8 - 16Khz
Audio Codecs : (MDCT, Psychoacoustics analysis)
MP3, AAC
bitrate: 32-768 Kbits/sec
Sampling rate : 8 - 48Khz
Audio/Sound Effects – Android Apps
Audio Effects
• Intelligent Loudness Control (Automatic Gain Control)
• Wideband Automatic Noise Removal (WANR)
• Envelope/Stereo Processing
• Voice/Vocal Enhancement
• Base Enhancement
• Sibilant/Fricative Smoothing
• Dynamic Listening Fatigue Reduction (DLFR)
• Multi-Band Graphic Equalizer (Equalizer)
• Low Pass Filtering
Echo Effect : Information in Time domain
Signal delay:
y(t) = x(t) + decay*x(t-delay)
Raw Sound:
Echoed Sound:
Bass Enhancement :Information in Frequency domain
Subwoofer: reproduce low-pitched audio frequencies
known as bass (e.g.: Drum Sound)
Frequency range : 20-200Hz
Bass system frequency response
Resources
QA Community:
Signal Processing Stack exchange
http://dsp.stackexchange.com/
Open Source Contribution:
Audacity: Free Audio Editor and Recorder
audacity.sourceforge.net/
FFmpeg (solution to record, convert and stream audio and video)
https://www.ffmpeg.org/
Resources
Indian Research Start-ups:• ATC Labs, Noida• Violet 3D, Bangalore• Akshar Speech Technologies, Hyderabad
Research Labs:• Fraunhofer Institute, Germany• Dolby Laboratories• Philips Research• DTS/SRS Labs
Acknowledgment
Special thanks to,
Prof. Naren Naik
&
ATC Labs, Noida, India
Thanks for your time.