adaptive multi rate coder using aclp (2)

Guided by : Prepared by:Mr. Vijayendra Desai Patel Hetal -82914 Nakrani Ankita -6247 Vora Mahesh-5267 Zalavadiya Ashish-

7284

Concerned with obtaining compact digital representation of voice signals for more efficient transmission or smaller storage size.

Objective is to represent speech signal with minimum number of bits yet maintain the perceptual quality

The human’s vocal apparatus consists of: – lungs – trachea (wind pipe) – larynx contains 2 folds of skin called vocal cords which blow apart and flap together as air is forced through oral tract nasal tract

Speech Coder: device that converts speech to digital

Types of speech coders – Waveform coders Convert any analog signal to digital form – Vocoders (Parametric coders) Try to exploit special properties of speech

signal to reduce bit rate Build model of speech – transmit parameters of

model – Hybrid Coders Combine features of waveform and vocoders

Type of speech codec

ADPCM LPC CELP

High Quality High Bit rate

Low Bit rate Low Quality

Medium Bit rate Good Quality

Types ofSpeech Codecs

Types ofSpeech Codecs

WaveformCodecs

WaveformCodecs

VocodersVocodersHybridCodecsHybridCodecs

Waveform codecs◦ Sample and code◦ High-quality and not complex◦ Large amount of bandwidth

source codecs (vocoders)◦ Match the incoming signal to a math model◦ Linear-predictive filter model of the vocal tract◦ A voiced/unvoiced flag for the excitation◦ The information is sent rather than the signal◦ Low bit rates, but sounds synthetic◦ Higher bit rates do not improve much

Hybrid codecs◦ Attempt to provide the best of both◦ Perform a degree of waveform matching◦ Utilize the sound production model◦ Quite good quality at low bit rate

Similar to images, we can also compress speech to make it smaller and easier to store and transmit.

General compression methods such as DPCM can also be used.

More compression can be achieved by taking advantage of the speech production model.

•Major challenges for designing of coder•high quality speech throughout a wide variety of channel conditions

• Traditionally, fixed source/channel bit allocation

•Solution •Variable bit rate allocation for source and channel coder.

Why Adaptive Multi- Rate (AMR)?

• To satisfy the requirement of variable bit rate, • The quantization parameters of the fixed rate

coders are changed.• In CELP (Code Excited Linear Predictive coder),

size of code book, gain, Linear Predictive parameters, etc are changed to have variable bit rate.

• CELP suffers from the larger processing time due to stochastic codebook.

• Algebraic/ well structured codebook is used in ACELP to solve problem of CELP.

Operating Modes of AMR

Performance Comparison Between some Standardized Coders

14

The Speech SignalThe Speech Signal

Pitch PeriodBackground

Signal

Unvoiced Signal (noise-like sound)

15

Speech Waveforms and SpectraSpeech Waveforms and Spectra

100 msec100 msec

• S-silence-S-silence-background-no background-no speechspeech

•U-unvoiced, no U-unvoiced, no vocal cord vocal cord vibration vibration (aspiration, (aspiration, unvoiced sounds)unvoiced sounds)

• V-voiced-quasi-V-voiced-quasi-periodic speechperiodic speech

16

Voiced Vs UnvoicedVoiced Vs Unvoiced– voiced stops are transient sounds produced by

building up pressure behind a total constriction in the oral tract and then suddenly releasing the pressure, resulting in a pop-like sound• /B/ constriction at lips• /D/ constriction at back of teeth• /G/ constriction at velum

– unvoiced stops have no vocal cord vibration during period of closure => brief period of fraction (due to sudden turbulence of escaping air) and aspiration (steady air flow from the glottis) before voiced excitation begins

17

Pitch and formantsPitch and formants• For certain voiced sound, your vocal cords vibrate (open and

close). • The rate at which the vocal cords vibrate determines the pitch

of your voice.• For men pitch period is 4-20 ms (50-250Hz)• For women pitch period is 2-8 ms (120-500Hz)

• Resonant frequency of vocal tract tube is called formants

17

Pitch PeriodBackground Signal

18

Speech production ModelSpeech production Model

• Non uniform probability density function (PDF)• Non zero auto correlation between successive

speech sample• Existence of voiced and unvoiced segment• Quasi periodicity of speech signal• Speech signals are band limited.

• So it can samples at finite rate and signal can be reconstructed from this sample

Probability density function (PDF):◦ Non-uniform PDF of speech signal◦ Very high probability of near zero amplitude◦ Significant probability of very high amplitude◦ And monotonically decreasing function in between

◦ This PDF function has distinct peak at x=0, due to existence of frequency pauses and low level speech segment.

◦ non-uniform quantizer attempt to match distribution of quantization level to PDF of speech.

Autocorrelation function Much correlation exists between the adjacent

samples of segment of speech. So, in every sample of speech, there are large

number of component which can be predicted from the previous samples with small random error.

All differential and predictive type of coders are designed based on this property.

Power spectral density function (PSD) Non flat characteristic of PSD of speech

makes it possible to obtain significant compression by coding speech in frequency domain.

Long term average PSD of speech shows that high frequency components contribute very little to total speech energy.

So, coding of speech in different frequency band can lead to significant coding gain

Characteristics of speech signal

Quantizer removes irrelevance in the signal, and operation is irreversible

1) Uniform quantization: Amplitude level quantizer2) Non uniform quantization: A law and µ law companding3) Adaptive quantization4) Vector quantization

• Vocoders• Channel vocoder• Formant vocoders• Cepstrum vocoder

• LPC• LPC vocoder• Multiplse Excited LPC• Code- Exited LPC

It uses analysis and synthesis approach Signal need to be analyzed at the

transmitter, It determines the envelop of speech signal

for number of frequency band and then sample encode and multiplexed these samples with encoded output of the filter.

Voiced unvoiced decision, energy information about each band and pitch frequency will be packed and transmitted.

Speech Production Models

Physical Model

Mathematical Model

LPC Decoderunpack

Powerdecoder

LPCdecoder

Pitch perioddecoder

Impulse traingenerator

Gain computation

De-emphasis

Synthesisfilter

White noise generator

Voiced/Unvoicedspeech

LPCBit stream

Pitch periodindex

voicing Powerindex

LPCindex

Synthesisspeech

Analysis-by-Synthesis Analysis-by-Synthesis Excitation CodingExcitation Coding

CELPOriginal speech sample

1,1 3,5 6,8 8,8

CELP Encoder Block Diagram

Gaussian Excitation Codebook

Pitch Synthesis Filter θp(z)

LP Synthesis Filter θ(z)

Buffer and LP analysis

Perceptual Weighting Filter W(z)

Error Energy

minimization

E n c o d e r

+

-

Gain, θ0Pitch Estimate P

LP Parameters

Excitation Parameters

Index ,k

LP parameters

Speech S(n)

w

e(n)k(n) Channel

PP zz

1

1)(

S*(n)

Long term analysis

Short term analysis

CELP bit allocation for AMR

Algebraic Code Excited Linear Predictive (ACELP) Coder

Bitrate (kbps) Process delay SNR MSE

4.75 1.411.392 3.73E-04

5.15 1.95812.3201 3.01E-04

5.9 2.48410.0201 5.11E-04

6.7 3.5558.7001 6.93E-04

7.4 5.9557.6269 8.87E-04

7.95 10.2347.8191 8.49E-04

10.2 70.6587.3258 9.51E-04

12.2 284.1946.2923 0.0012

ResultsFive.wav


4.75 3.4431.6833 0.0195

5.15 4.4961.9458 1.84E-02

5.9 4.9782.5418 1.60E-02

6.7 7.2972.7128 1.54E-02

7.4 12.7212.7049 1.54E-02

7.95 22.8913.7974 1.20E-02

10.2 165.7063.7305 1.22E-02

12.2 652.1344.312 1.07E-02


4.75 7.073 9.0667 4.03E-04

5.15 6.776 9.6937 3.49E-04

5.9 10.589 9.7603 3.44E-04

6.7 15.565 10.1021 3.19E-04

7.4 25.128 10.3836 2.98E-04

7.95 47.284 11.198 2.48E-04

10.2 387.415 11.9593 2.08E-04

12.2 1387.386 12.292 1.92E-04

adaptive multi rate coder using aclp (2)

Documents