adaptive multi rate coder using aclp (2)
TRANSCRIPT
Guided by : Prepared by:Mr. Vijayendra Desai Patel Hetal -82914 Nakrani Ankita -6247 Vora Mahesh-5267 Zalavadiya Ashish-
7284
Concerned with obtaining compact digital representation of voice signals for more efficient transmission or smaller storage size.
Objective is to represent speech signal with minimum number of bits yet maintain the perceptual quality
The human’s vocal apparatus consists of: – lungs – trachea (wind pipe) – larynx contains 2 folds of skin called vocal cords which blow apart and flap together as air is forced through oral tract nasal tract
Speech Coder: device that converts speech to digital
Types of speech coders – Waveform coders Convert any analog signal to digital form – Vocoders (Parametric coders) Try to exploit special properties of speech
signal to reduce bit rate Build model of speech – transmit parameters of
model – Hybrid Coders Combine features of waveform and vocoders
Type of speech codec
ADPCM LPC CELP
High Quality High Bit rate
Low Bit rate Low Quality
Medium Bit rate Good Quality
Types ofSpeech Codecs
Types ofSpeech Codecs
WaveformCodecs
WaveformCodecs
VocodersVocodersHybridCodecsHybridCodecs
Waveform codecs◦ Sample and code◦ High-quality and not complex◦ Large amount of bandwidth
source codecs (vocoders)◦ Match the incoming signal to a math model◦ Linear-predictive filter model of the vocal tract◦ A voiced/unvoiced flag for the excitation◦ The information is sent rather than the signal◦ Low bit rates, but sounds synthetic◦ Higher bit rates do not improve much
Hybrid codecs◦ Attempt to provide the best of both◦ Perform a degree of waveform matching◦ Utilize the sound production model◦ Quite good quality at low bit rate
Similar to images, we can also compress speech to make it smaller and easier to store and transmit.
General compression methods such as DPCM can also be used.
More compression can be achieved by taking advantage of the speech production model.
•Major challenges for designing of coder•high quality speech throughout a wide variety of channel conditions
• Traditionally, fixed source/channel bit allocation
•Solution •Variable bit rate allocation for source and channel coder.
Why Adaptive Multi- Rate (AMR)?
• To satisfy the requirement of variable bit rate, • The quantization parameters of the fixed rate
coders are changed.• In CELP (Code Excited Linear Predictive coder),
size of code book, gain, Linear Predictive parameters, etc are changed to have variable bit rate.
• CELP suffers from the larger processing time due to stochastic codebook.
• Algebraic/ well structured codebook is used in ACELP to solve problem of CELP.
Operating Modes of AMR
Performance Comparison Between some Standardized Coders
14
The Speech SignalThe Speech Signal
Pitch PeriodBackground
Signal
Unvoiced Signal (noise-like sound)
15
Speech Waveforms and SpectraSpeech Waveforms and Spectra
100 msec100 msec
• S-silence-S-silence-background-no background-no speechspeech
•U-unvoiced, no U-unvoiced, no vocal cord vocal cord vibration vibration (aspiration, (aspiration, unvoiced sounds)unvoiced sounds)
• V-voiced-quasi-V-voiced-quasi-periodic speechperiodic speech
16
Voiced Vs UnvoicedVoiced Vs Unvoiced– voiced stops are transient sounds produced by
building up pressure behind a total constriction in the oral tract and then suddenly releasing the pressure, resulting in a pop-like sound• /B/ constriction at lips• /D/ constriction at back of teeth• /G/ constriction at velum
– unvoiced stops have no vocal cord vibration during period of closure => brief period of fraction (due to sudden turbulence of escaping air) and aspiration (steady air flow from the glottis) before voiced excitation begins
17
Pitch and formantsPitch and formants• For certain voiced sound, your vocal cords vibrate (open and
close). • The rate at which the vocal cords vibrate determines the pitch
of your voice.• For men pitch period is 4-20 ms (50-250Hz)• For women pitch period is 2-8 ms (120-500Hz)
• Resonant frequency of vocal tract tube is called formants
17
Pitch PeriodBackground Signal
18
Speech production ModelSpeech production Model
• Non uniform probability density function (PDF)• Non zero auto correlation between successive
speech sample• Existence of voiced and unvoiced segment• Quasi periodicity of speech signal• Speech signals are band limited.
• So it can samples at finite rate and signal can be reconstructed from this sample
Probability density function (PDF):◦ Non-uniform PDF of speech signal◦ Very high probability of near zero amplitude◦ Significant probability of very high amplitude◦ And monotonically decreasing function in between
◦ This PDF function has distinct peak at x=0, due to existence of frequency pauses and low level speech segment.
◦ non-uniform quantizer attempt to match distribution of quantization level to PDF of speech.
Autocorrelation function Much correlation exists between the adjacent
samples of segment of speech. So, in every sample of speech, there are large
number of component which can be predicted from the previous samples with small random error.
All differential and predictive type of coders are designed based on this property.
Power spectral density function (PSD) Non flat characteristic of PSD of speech
makes it possible to obtain significant compression by coding speech in frequency domain.
Long term average PSD of speech shows that high frequency components contribute very little to total speech energy.
So, coding of speech in different frequency band can lead to significant coding gain
Characteristics of speech signal
Quantizer removes irrelevance in the signal, and operation is irreversible
1) Uniform quantization: Amplitude level quantizer2) Non uniform quantization: A law and µ law companding3) Adaptive quantization4) Vector quantization
• Vocoders• Channel vocoder• Formant vocoders• Cepstrum vocoder
• LPC• LPC vocoder• Multiplse Excited LPC• Code- Exited LPC
It uses analysis and synthesis approach Signal need to be analyzed at the
transmitter, It determines the envelop of speech signal
for number of frequency band and then sample encode and multiplexed these samples with encoded output of the filter.
Voiced unvoiced decision, energy information about each band and pitch frequency will be packed and transmitted.
Speech Production Models
Physical Model
Mathematical Model
LPC Decoderunpack
Powerdecoder
LPCdecoder
Pitch perioddecoder
Impulse traingenerator
Gain computation
De-emphasis
Synthesisfilter
White noise generator
Voiced/Unvoicedspeech
LPCBit stream
Pitch periodindex
voicing Powerindex
LPCindex
Synthesisspeech
29
Analysis-by-Synthesis Analysis-by-Synthesis Excitation CodingExcitation Coding
CELPOriginal speech sample
1,1 3,5 6,8 8,8
CELP Encoder Block Diagram
Gaussian Excitation Codebook
Pitch Synthesis Filter θp(z)
LP Synthesis Filter θ(z)
Buffer and LP analysis
Perceptual Weighting Filter W(z)
Error Energy
minimization
E n c o d e r
+
-
Gain, θ0Pitch Estimate P
LP Parameters
Excitation Parameters
Index ,k
LP parameters
Speech S(n)
w
e(n)k(n) Channel
PP zz
1
1)(
S*(n)
Long term analysis
Short term analysis
CELP bit allocation for AMR
Algebraic Code Excited Linear Predictive (ACELP) Coder
Bitrate (kbps) Process delay SNR MSE
4.75 1.411.392 3.73E-04
5.15 1.95812.3201 3.01E-04
5.9 2.48410.0201 5.11E-04
6.7 3.5558.7001 6.93E-04
7.4 5.9557.6269 8.87E-04
7.95 10.2347.8191 8.49E-04
10.2 70.6587.3258 9.51E-04
12.2 284.1946.2923 0.0012
ResultsFive.wav
Bitrate (kbps) Process delay SNR MSE
4.75 3.4431.6833 0.0195
5.15 4.4961.9458 1.84E-02
5.9 4.9782.5418 1.60E-02
6.7 7.2972.7128 1.54E-02
7.4 12.7212.7049 1.54E-02
7.95 22.8913.7974 1.20E-02
10.2 165.7063.7305 1.22E-02
12.2 652.1344.312 1.07E-02
Bitrate (kbps) Process delay SNR MSE
4.75 7.073 9.0667 4.03E-04
5.15 6.776 9.6937 3.49E-04
5.9 10.589 9.7603 3.44E-04
6.7 15.565 10.1021 3.19E-04
7.4 25.128 10.3836 2.98E-04
7.95 47.284 11.198 2.48E-04
10.2 387.415 11.9593 2.08E-04
12.2 1387.386 12.292 1.92E-04