universsity of california at los angeles

Upload: alok-dangi

Post on 02-Jun-2018

217 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/10/2019 Universsity of California at Los Angeles

    1/30

    University of California at Los Angeles

    Department of Electrical EngineeringEE 214A: Digital Speech Processing, Winter 2002

    Wideband Speech Codingwith L inear Predictive Coding

    (LPC)

    Professor:Abeer Alwan

    Authors:Ozgu Ozun

    Philipp SteurerDaniel Thell

    Abstract

    Wideband speech signals of 2 males and 2 females were coded using an improved

    version of Linear Predictive Coding (LPC). The sampling frequency was at 16 kHz

    and the bit rate was at 15450 bits per second, where the original bit rate was at 128000

    bits per second. The tradeoffs between the bit rate, end-to-end delay, speech quality

  • 8/10/2019 Universsity of California at Los Angeles

    2/30

    and complexity were analyzed. Simulations as well as Segmental SNR evaluations

    were done to analyze the performance of the implemented algorithm.

    Table of Contents

    ABSTRACT

    1 INTRODUCTION

    2 BACKGROUND

    3 PROJECT DESCRIPTION

    3.1 METHODOLOGY

    3.2 PRE-EMPHASIS FILTER

    3.3 QUANTIZATION OF LPC-COEFFICIENTS

    4 VOICE-EXCITED LPC VOCODER

    4.1 DCT OF RESIDUAL SIGNAL

    4.2 PERFORMANCE ANA LYSIS

    4.2.1 Bit Rates4.2.2 Overall Delay of the System4.2.3 Computational Complexity4.2.4 Objective Performance Evaluation

    5 DISCUSSION OF RESULTS

    5.1 QUAL ITY

    5.1.1 Subjective quality

    5.1.2 Segmental signal to noise ratio

    5.2 QUALITY-PERFORMANCE TRADEOFFS

    5.2.1 Bit rate performance5.2.2 Delay and computational complexity

    http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#abstracthttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#abstracthttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#1http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#1http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#2http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#2http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#3http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#3http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#31http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#31http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#32http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#32http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#33http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#33http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#4http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#4http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#41http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#42http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#42http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#421http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#421http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#422http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#422http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#423http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#423http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#424http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#424http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#5http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#5http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#51http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#51http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#511http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#511http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#512http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#512http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#52http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#52http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#521http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#521http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#522http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#522http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#522http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#521http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#52http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#512http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#511http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#51http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#5http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#424http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#423http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#422http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#421http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#42http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#41http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#4http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#33http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#32http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#31http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#3http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#2http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#1http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#abstract
  • 8/10/2019 Universsity of California at Los Angeles

    3/30

    6 CONCLUSIONS

    7 REFERENCES

    8 APPENDIX

    8.1 MAIN FILE

    8.2 LPC OUTPUT INFORMATION GENERATION

    8.3 PLA IN LPC VOCODER

    8.3.1 Main file8.3.2 Plain LPC decoder

    8.4 VOICE-EXCITED LPC VOCODER

    8.4.1 Main File8.4.2 Voice-excited LPC decoder

    WAVE FILES

    1 Introduction

    Speech coding has been and still is a major issue in the area of digital speech

    processing. Speech coding is the act of transforming the speech signal at hand, to a

    more compact form, which can then be transmitted with a considerably smaller

    memory. The motivation behind this is the fact that access to unlimited amount of

    bandwidth is not possible. Therefore, there is a need to code and compress speech

    signals. Speech compression is required in long-distance communication, high-quality

    speech storage, and message encryption. For example, in digital cellular technology

    many users need to share the same frequency bandwidth. Utilizing speech

    compression makes it possible for more users to share the available system. Another

    example where speech compression is needed is in digital voice storage. For a fixed

    amount of available memory, compression makes it possible to store longer messages[1].

    Speech coding is a lossy type of coding, which means that the output signal does not

    exactly sound like the input. The input and the output signal could be distinguished to

    be different. Coding of audio however, is a different kind of problem than speech

    coding. Audio coding tries to code the audio in a perceptually lossless way. This

    http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#6http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#6http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#7http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#7http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#8http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#8http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#81http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#81http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#82http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#82http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#83http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#83http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#831http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#831http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#832http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#832http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#84http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#84http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#841http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#841http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#842http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#842http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#resultshttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#resultshttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#resultshttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#842http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#841http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#84http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#832http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#831http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#83http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#82http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#81http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#8http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#7http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report.html#6
  • 8/10/2019 Universsity of California at Los Angeles

    4/30

    means that even though the input and output signals are not mathematically

    equivalent, the sound at the output is the same as the input. This type of coding is used

    in applications for audio storage, broadcasting, and Internet streaming [2].

    Several techniques of speech coding such as Linear Predictive Coding (LPC),

    Waveform Coding and Subband Coding exist. The problem at hand is to use LPC tocode 2 male and 2 female speech sentences. The speech signals that need to be coded

    are wideband signals with frequencies ranging from 0 to 8 kHz. The sampling

    frequency should be at 16 kHz with a maximum end-to-end delay of 100 ms.

    Different types of applications have different time delay constraints. For example in

    network telephony only a delay of 1ms is acceptable, whereas a delay of 500 ms is

    permissible in video telephony [3]. Another constraint at hand is not to exceed an

    overall bit rate of 16 kbps. When all is said and done, the system must have less than

    20 million operations per second (MOPS).

    The speech coder that will be developed is going to be analyzed using both subjectiveand objective analysis. Subjective analysis will consist of listening to the encoded

    speech signal and making judgments on its quality. The quality of the played back

    speech will be solely based on the opinion of the listener. The speech can possibly be

    rated by the listener either impossible to understand, intelligible or natural sounding.

    Even though this is a valid measure of quality, an objective analysis will be

    introduced to technically assess the speech quality and to minimize human bias. The

    objective analysis will be performed by computing Segmental Signal to Noise Ratio

    (SEGSNR) between the original and the coded speech signal. Furthermore, an

    analysis on the study of effects of bit rate, complexity and end-to-end delay on the

    speech quality at the output will be made. The report will be concluded with the

    summary of results and some ideas for future work.

    2 Background

    There are several different methods to successfully accomplish speech coding. Some

    main categories of speech coders are LPC Vocoders, Waveform and Subband coders.

    The speech coding in this project will be accomplished by using a modified version of

    LPC-10 technique. Linear Predictive Coding is one possible technique of analyzing

    and synthesizing human speech. The exact details of the analysis and synthesis of this

    technique that was used to solve our problem will be discussed in the methodology

    section. Only an overview will be included in this section, along with the previously

    mentioned other types of coding techniques.

    LPC method has been used for a long time. Texas Instruments had developed a

    monolithic PMOS speech synthesizer integrated circuit as early as 1978. This marked

    the first time the human vocal tract had been electronically duplicated on a single chip

  • 8/10/2019 Universsity of California at Los Angeles

    5/30

    of silicon [5]. This one of the first speech synthesizer used LPC to accomplish

    successful synthesis. LPC makes coding at low bit rates possible. For LPC-10, the bit

    rate is about 2.4 kbps. Even though this method results in an artificial sounding

    speech, it is intelligible. This method has found extensive use in military applications,

    where a high quality speech is not as important as a low bit rate to allow for heavy

    encryption of secret data. However, since a high quality sounding speech is requiredin the commercial market, engineers are faced with using other techniques that

    normally use higher bit rates and result in higher quality output. In LPC-10 vocal tract

    is represented as a time-varying filter and speech is windowed about every 20 ms. For

    each frame, the gain and only 10 of the coefficients of a linear prediction filter are

    coded for analysis and decoded for synthesis. In 1996, LPC-10 was replaced by

    mixed-excitation linear prediction (MELP) coder to be the United States Federal

    Standard for coding at 2.4 kbps. This MELP coder is an improvement to the LPC

    method, with some additional features that have mixed excitation, aperiodic pulses,

    adaptive spectral enhancement and pulse dispersion filtering as mentioned in [4].

    Waveform coders on the other hand, are concerned with the production of a

    reconstructed signal whose waveform is as close as possible to the original signal,

    without any information about how the signal to be coded was generated. Therefore,

    in theory, this type of coders should be input signal independent and work for both

    speech and non-speech input signals [4]. Waveform coders produce a good quality of

    speech signals above bit rates of 16 kbps. However, if the bit rate is decreased below

    16 kpbs, the quality deteriorates quickly. One form of waveform coding is Pulse Code

    Modulation (PCM). This type of waveform coding involves sampling and quantizing

    the input signal. PCM is a memoryless coding algorithm as mentioned in [4]. Another

    type of PCM is Differential Pulse Code Modulation (DPCM). This method quantizes

    the difference between the original and the predicted signals. This method involves

    prediction of the next sample from the previous samples. This is possible since there

    is a correlation in speech samples because of the effects of the vocal tract and the

    vibrations in the vocal cords [6]. It is possible to improve the predictor as well as the

    quantizer in DPCM if they are made adaptive, in order to match the characteristics of

    the speech that is to be coded. This type of coders is called Adaptive Differential

    Pulse Code Modulation (ADPCM).

    One other type of speech coders is called the Subband coders. This type of codinginvolves filter bank analysis to be undertaken in order to filter the input signal into

    several frequency bands. Bit allocation is done to each band by a certain criterion [4].

    Presently however, Subband coders are not widely used for speech coding. It is very

    difficult to create high quality speech by using low bit rates with this technique. As

    suggested in [4], Subband coding is mostly utilized in the medium to high bit rate

    applications of speech coding.

  • 8/10/2019 Universsity of California at Los Angeles

    6/30

    3 Project Description

    3.1 Methodo log y

    Fig. 3-1: Block diagram of an LPC vocoder.

    In this section an explanation of the LPC speech coding technique will be given. The

    specific modifications and additions done to improve this algorithm will also be

    covered. However, before jumping into the detailed methodology of our solution, it

    will be helpful to give a brief overview of speech production. Speech is produced

    when velum is lowered to make it acoustically coupled with the vocal tract. Nasal

    sounds of speech are produced this way [7]. Speech signals consist of several

    sequences of sounds. Each sound can be thought of as a unique information. There are

    voiced and unvoiced types of speech sounds. The fundamental difference between

    these two types of speech sounds comes from the way they are produced. The

    vibrations of the vocal cords produce voiced sounds. The rate at which the vocal cordsvibrate dictates the pitch of the sound. On the other hand, unvoiced sounds do not rely

    on the vibration of the vocal cords. The unvoiced sounds are created by the

    constriction of the vocal tract. The vocal cords remain open and the constrictions of

    the vocal tract force air out to produce the unvoiced sounds [7].

    LPC technique will be utilized in order to analyze and synthesize speech signals. This

    method is used to successfully estimate basic speech parameters like pitch, formants

    and spectra. A block diagram of an LPC vocoder can be seen in Fig.3-1. The principle

    behind the use of LPC is to minimize the sum of the squared differences between the

    original speech signal and the estimated speech signal over a finite duration. Thiscould be used to give a unique set of predictor coefficients [7]. These predictor

    coefficients are normally estimated every frame, which is normally 20 ms long. The

    predictor coefficients are represented by ak. Another important parameter is the gain

    (G). The transfer function of the time-varying digital filter is given by:

  • 8/10/2019 Universsity of California at Los Angeles

    7/30

  • 8/10/2019 Universsity of California at Los Angeles

    8/30

    3.2 Pre-emphasis Filter

    From the speech production model it is known that the speech undergoes a spectral tilt

    of -6dB/oct. To counteract this fact a pre-emphasis filter of the following form is

    used:

    The frequency response of a typical pre-emphasis filter is shown in Fig. 3-3 as well as

    its inverse filter. This is involved during the synthesis / reconstruction of the speech

    signal and is as follows:

    Fig. 3-3: Frequency response of the pre-emphasis filter and its inverse filter

    The main goal of the pre-emphasis filter is to boost the higher frequencies in order toflatten the spectrum. To give an idea of the improvement made by this filter the reader

    is referred to the plotted frequency spectrum in Fig. 3-4 of the vowel /i/ in the word

    nine. It can be seen how the spectrum is flattened. This improvement leads to a better

    result for the calculation of the coefficients using LPC. There are higher peaks visible

    for higher frequencies in the LPC-spectrum as can be seen in Fig. 3-5. Clearly, the

    coefficients corresponding to higher frequencies can be better estimated.

  • 8/10/2019 Universsity of California at Los Angeles

    9/30

    Fig. 3-4: Frequency spectrum of the vowel /i/ in the word nine.

    Fig. 3-5: Spectrum of the LPC model for the vowel /i/ in the word nine.

  • 8/10/2019 Universsity of California at Los Angeles

    10/30

    3.3 Quantizat ion o f LPC-coeff ic ients

    Usually direct Quantization of the predictor coefficients is not considered. To ensure

    stability of the coefficients (the poles and zeros must lie within the unit circle in the z-

    plane) a relatively high accuracy (8-10 bits per coefficients) is required. This comes

    from the effect that small changes in the predictor coefficients lead to relatively largechanges in the pole positions. There are two possible alternatives discussed in [7] to

    avoid the above problem. Only one of them is explained here, namely the partial

    reflection coefficients (PARCOR). These are intermediate values during the

    calculation of the well-known Levinson-Durbin recursion. Quantizing the

    intermediate values is less problematic than quantifying the predictor coefficients

    directly. Thus, a necessary and sufficient condition for the PARCOR values is .

    Should the poles not lie inside the unit circle one just flips the location, as we

    discussed in class.

    4 Voice-excited LPC Vocoder

    As the test of the sound quality of a plain LPC-10 vocoder showed, the weakest part

    in this methodology is the voice excitation. It is know from the literature [7] that one

    solution to improve the quality of the sound is the use of voice-excited LPC vocoders.

    Systems of this type have been studied by Atal et al. [8] and Weinstein [9]. Fig. 4-1

    shows a block diagram of a voice-excited LPC vocoder. The main difference to a

    plain LPC-10 vocoder, as showed in Fig. 3-1, is the excitation detector, which will be

    explained in the sequel.

    Fig. 4-1: Block diagram of a voice-excited LPC vocoder.

    The main idea behind the voice-excitation is to avoid the imprecise detection of thepitch and the use of an impulse train while synthesizing the speech. One should rather

    try to come up with a better estimate of the excitation signal. Thus the input speech

    signal in each frame is filtered with the estimated transfer function of LPC analyzer.

    This filtered signal is called the residual. If this signal is transmitted to the receiver

    one can achieve a very good quality. The tradeoff, however, is paid by a higher bit

    rate, although there is no longer a need to transfer the pitch frequency and the voiced /

  • 8/10/2019 Universsity of California at Los Angeles

    11/30

    unvoiced information. We therefore looked for a solution to reduce the bit rate to 16

    kbits/sec, which is described in the following section.

    4.1 DCT of residu al sign al

    First of all, for a good reconstruction of the excitation only the low frequencies of theresidual signal are needed. To achieve a high compression rate we employed the

    discrete cosine transform (DCT) of the residual signal. It is known, that the DCT

    concentrates most of the energy of the signal in the first few coefficients. Thus one

    way to compress the signal is to transfer only the coefficients, which contain most of

    the energy. Our tests and simulations showed that these coefficients could even be

    quantized using only 4 bits. The receiver simply performs an inverse DCT and uses

    the resulting signal to excite the voice.

    4.2 Performance Analysis

    4.2.1 Bit Rates

    In the sequel the necessary bit rates of the two solutions are computed. The bit rate for

    a plain LPC vocoder is shown in Table 4-1 and the bit rate for a voice-excited LPC

    vocoder with DCT is printed in Table 4-2. The following parameters were fixed for

    the calculation:

    Speech signal bandwidth B = 8 kHz

    Sampling rate Fs = 16000 Hz (or samples/sec.)

    Window length (frame): 20 mswhich results in 320 samples per frame by the given sampling rate Fs

    Overlapping: 10 ms (overlapping is needed for perfect reconstruction)

    hence: the actual window length is 30ms or consists of 480 samples

    There are 50 frames per second

    Number of predictor coefficients of the LPC model = 18 (see calculation in .)

    voidNumber of bits per

    frame

    Predictor coefficients 18 * 8 = 144

    Gain 5Pitch period 6

    Voiced/unvoiced

    switch1

    Total 156

    Overall bit rate50 * 156 = 7800 bits /

    second

  • 8/10/2019 Universsity of California at Los Angeles

    12/30

    Table 4-1: Bit rate for plain LPC vocoder

    void Number of bits per framePredictor

    coefficients18 * 8 = 144

    Gain 5

    DCT coefficients 40 * 4 = 160

    Total 309

    Overall bit rate50 * 309 = 15450 bits /

    second

    Table 4-2: Bit rate for voice-excited LPC vocoder with DCT

    4.2.2 Overall Delay of the System

    The overall delay of the systems tells one how long it takes from the input of the first

    sample of speech into the system until the first sample of the synthesized speech is

    available at the output of the system. This is clearly an important number, since one

    would like to process the data in real-time. If, for example, an LPC vocoder is used in

    a communication system, e.g. a cellular phone, one does not accept a large delay of

    the transmitted speech signal. Humans are able to perceive delays of speech within a

    several hundred of milliseconds during a talk at the telephone. For our project thelimit was set to a maximum allowed value of 100 ms.

    For both of our proposed solutions the overall delay is 30 ms, since the window length

    is 20ms and the overlapping is 10ms. In other words, the systems needs to have at

    least 30 ms of input data before the first calculation can be done. Of course, the

    calculation time needs to be added to this delay. It is not possible to come up with this

    number, since we employed Matlab for our simulations. The calculation time

    therefore depends on the speed of the used computer. We therefore just provide the

    reader a way to calculate this number on its own for a particular microprocessor

    system, for which the processor speed is known. This simply requires the knowledge

    of the computational complexity of the system, which is provided in section 4.2.3 of

    this report.

    4.2.3 Computational Complexity

  • 8/10/2019 Universsity of California at Los Angeles

    13/30

    Again, the same parameters, as stated in section 4.2.1, are fixed for the calculations.

    All numbers show the multiplications or additions required per frame. This number

    needs to be multiplied by the number of frames of a given speech signal.

    Calculation of the LPC coefficients. The Levinson-Durbin recursion requires O(p2)

    floating point operations per second (FLOPS). In our case p = 18, hence this steprequires 324 FLOPS per frame.

    The pre-emphasis filter needs 480 additions and 480 multiplications, which is equal

    to 960 operations.

    The cross-correlation consists of 480 additions and 480 multiplications, which is

    equal to 960 operations.

    The reconstruction of the LPC needs about 480*18 additions and 480*18

    multiplications, which is equal to 17280 operations.

    The inverse filter requires again 480 additions and 480 multiplications, which is

    equal to 960 operations

    Hence, the total number of operations for the plain LPC vocoder is 20484 operations

    per frame. The sentences in section 4.2.4 typically contain of about 150 frames at 50

    frames/second. Thus, the computational complexity for the plain LPC vocoder is

    about 1 MFLOPS (Mega-Flops).

    For the voice-excited vocoder the calculation of the cross-correlation is not needed but

    the discrete cosine transform and its inverse is needed instead. However, in the

    Matlab-code used, the cross-correlation is still computed as it was the case for the

    plain LPC vocoder. Therefore, the complexity is slightly increased.

    The DCT (if the fast algorithm is applied) requires 480 multiplications equaling 480

    operations

    The inverse-DCT requires the same number of operations, namely 480 operations

    The total number of FLOPS for the voice-excited LPC vocoder is therefore 21444

    operations per frame. If we consider the same parameters as before, the computational

    complexity is roughly 1.07 MFLOPS. The improved sound quality makes up for the

    higher number of FLOPS.

    4.2.4 Objective Performance Evaluation

    We measured the segmental signal to noise ratio (SEGSNR) of the original speech file

    compared to the coded and reconstructed speech file using the provided Matlab-

    function "segsnr". The obtained results are as follows:

  • 8/10/2019 Universsity of California at Los Angeles

    14/30

    1) A Male speaker saying: "Kick the ball straight and follow through."

    2) A Female speaker saying: "It's easy to tell the depth of a well."

    3) A Male speaker saying: "A pot of tea helps to pass the evening."

    4) A Female speaker saying: "Glue the sheet to the dark blue background."

    Vocoder type SNR 1 SNR 2 SNR 3 SNR 4

    Plain LPC -24.92 dB -24.85 dB -24.87 dB -23.94 dB

    Voice-excited LPC 0.5426 dB 0.7553 dB 0.5934 dB 0.2319 dB

    Note that the calculation of the SNR requires the signals to be normalized before the

    ratio can be calculated.

    5 Discussion of Results

    Sentence OriginalFile

    LPC encoded Voice-excited LPC

    "A pot of tea helpsto pass the evening"

    s2omwb.wav s2omwb_lpc.wav s2omwb_velpc.wav

    "Kick the ballstraight and follow

    through"

    s1omwb.wav s1omwb_lpc.wav s1omwb_velpc.wav

    "Glue the sheet to

    the dark blue

    background"

    s2ofwb.wav s2ofwb_lpc.wav s2ofwb_velpc.wav

    "It's easy to tell thedepth of a well" s1ofwb.wav s1ofwb_lpc.wav s1ofwb_velpc.wav

    Links to the sounds (original and coded) using both methods. The two first sentences are said by a male

    speaker, the two last by a female speaker

    5.1 Quality

    5.1.1 Subjective quality

    A comparison of the original speech sentences against the LPC reconstructed speechand the voice-excited LPC methods were studied. In both cases, the reconstructed

    speech has a lower quality than the input speech sentences. Both of the reconstructed

    signals sound mechanized and noisy with the output of plain LPC vocoder being

    nearly unintelligible. The LPC reconstructed speech sounds guttural with a lower

    pitch than the original sound. The sound seems to be whispered. The noisy feeling is

    very strong. The voice-excited LPC reconstructed file sounds more spoken and less

    http://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s2omwb.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s2omwb_lpc.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s2omwb_lpc.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s2omwb_velpc.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s2omwb_velpc.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s2omwb_velpc.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s1omwb.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s1omwb_lpc.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s1omwb_lpc.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s1omwb_velpc.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s1omwb_velpc.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s1omwb_velpc.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s2ofwb.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s2ofwb.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s2ofwb_lpc.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s2ofwb_lpc.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s2ofwb_velpc.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s2ofwb_velpc.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s1ofwb.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s1ofwb.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s1ofwb_lpc.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s1ofwb_lpc.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s1ofwb_velpc.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s1ofwb_velpc.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s1ofwb_velpc.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s1ofwb_lpc.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s1ofwb.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s2ofwb_velpc.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s2ofwb_lpc.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s2ofwb.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s1omwb_velpc.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s1omwb_lpc.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s1omwb.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s2omwb_velpc.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s2omwb_lpc.wavhttp://www.seas.ucla.edu/spapl/projects/ee214aW2002/1/report_fichiers/s2omwb.wav
  • 8/10/2019 Universsity of California at Los Angeles

    15/30

    whispered. The guttural feeling is also less and the words are much easier to

    understand. Overall the speech that was reconstructed using voice-excited LPC

    sounded better, but still sounded muffled.

    The waveforms in Fig 5-1 give the same idea. The voice-excited waveform looks

    closer to the original sound than the plain LPC reconstructed one.

    5.1.2 Segmental signal to noise ratio

    Looking at the segmental SNR, computed in section 4.2.4, it is obvious that the first

    sound is very noisy, having a negative SNR. The noise in this file is even stronger

    than the actual signal. The voice-excited LPC encoded sound sounds far better, and its

    SNR, although barely, is in the positive side. However, even the speech coded with

    the improved voice-excited LPC does not sound exactly like the original signal.

    It is noticeable that both the plain LPC and the voice-excited vocoders are notsensitive to the input sentence, the result is the same for a sentence with many voiced

    sounds and for a sentence with many fricatives or other unvoiced sounds. The good

    point is that any spoken sentence can be transmitted with the same overall results. The

    disadvantage is that we cannot focus on a specific aspect of the vocoder that would

    give much poorer results. To improve the quality, the overall system has to be

    improved. We cannot just improve the unvoiced sounds production to make the

    vocoder sound perfect.

    5.2 Quali ty-performanc e tradeoffs

    The LPC method to transmit speech sounds has some very good aspects, as well as

    some drawbacks. The huge advantage of vocoders is a very low bit rate compared to

    what is achieved for sound transmission. On the other hand, the speech quality

    achieved is quite poor.

  • 8/10/2019 Universsity of California at Los Angeles

    16/30

    Fig. 5-1: Waveform of the sentence "A pot of tea helps to pass the evening": a) original speech signal, b) LPC

    reconstructed speech signal, c) voice-excited LPC reconstructed speech signal

    5.2.1 Bit rate performance

    The achieved bit rate in both method are quite low, both under the required 16kbps.

    However, the voice-excited LPC coding requires a bandwidth twice as large as the

    plain LPC coding. This huge increase ends up with a better sound, but still not perfect.

    The computational complexity is nonetheless roughly the same.

  • 8/10/2019 Universsity of California at Los Angeles

    17/30

    Fig. 5-2: Speech quality vs. Bit rate trade-offs for different speech coding techniques

    In the plain LPC vocoder, one tries to estimate the pitch and then excite the

    synthesizer with the estimated parameters. This results in a poor, almost unintelligible

    sentence. The lack of accuracy in determining the pitch and the exact mathematical

    excitation results in a huge degradation in the quality. Shifting to the voice excited

    LPC technique, the pitch and the binary choice in the method of excitation is dropped

    (7 bits per frame) but all the errors have to be sent.

    The increase in the bit rate results then from a change in the excitation method. While

    impulse train was used as a source to the transfer function, the actual error made when

    computing the ak's, is now encoded and sent. Theoretically, a close to perfect

    reconstruction could be achieved if these errors were sent as floating points numbers.

    However, this would require a very large bandwidth. If we use 8 bits per error over

    the all frame, this would give 3840 bits per frame and then contribute to the overall bit

    rate with 192kbps. The errors are therefore compressed using an algorithm similar to

    the one used in the jpeg or mpeg compression. Taking the discrete cosine transform

    (DCT) and keeping only the 40 first coefficients (each quantized over 4 bits) allows a

    pretty good reconstruction of all the 480 errors in the frame. The maximum energiesare located in the first few coefficients, so that the last 440 are assumed to be null.

    However, we could notice that increasing the bit-rate of a vocoder is not the best idea

    since the improvement in the quality is not linear as can be seen in Fig. 5-2: Speech

    quality vs. Bit rate trade-offs for different speech coding techniques, as provided in

    [6]. If an increase in the required bandwidth significantly improves the quality at the

  • 8/10/2019 Universsity of California at Los Angeles

    18/30

    beginning, the increase required after for the same amount of improvement is

    tremendously more important.

    5.2.2 Delay and computational complexity

    The overall delay of the system is hard to measure and depends on the machine used.However it can be estimated looking at the time between the launch of the program

    and the creation of the output file.

    Both methods employed are of the same computational complexity. The voice-excited

    LPC method uses the original sound samples to produce the output sound while the

    plain LPC technique creates the output sound from more basic characteristics. The

    stronger link to the original signal in the voice-excited LPC method allows a more

    accurate reproduction of the sounds without increasing the complexity and the delay,

    since both concepts are closely linked.

    As mentioned before, an idea to improve the overall quality could be, beside an

    increase in the required bandwidth, an increase in the vocoder complexity to transmit

    more information of more pertinent information with the same bandwidth. The overall

    delay of the system will therefore increase but our vocoder complexity is low enough

    to allow a complexity increase and still meet the project requirements regarding the

    FLOPS.

    6 Conclusions

    The results achieved from the voice excited LPC are intelligible. On the other hand,the plain LPC results are much poorer and barely intelligible. This first

    implementation gives an idea on how a vocoder works, but the result is far below

    what can be achieved using other techniques. Nonetheless the voice-excited LPC used

    gives understandable results and is not optimized. The tradeoffs between quality on

    one side and bandwidth and complexity on the other side clearly appear here. If we

    want a better quality, the complexity of the system should be increased or a larger

    bandwidth has to be used.

    Since the voice-excited LPC gives pretty good results with all the required limitations

    of this project, we could try to improve it. A major improvement could come from thecompression of the errors. If we can send them in a loss-less manner to the

    synthesizer, the reconstruction would be perfect. An idea could be the use of Huffman

    codes for the DCT coefficients. Many simulations have to be done to get the right

    code book.

  • 8/10/2019 Universsity of California at Los Angeles

    19/30

    This would reduce the bit rate, so that we can use the additional amount of bandwidth

    to improve quality. At least two possibilities could be considered. The first one would

    be an increase in bits used to quantize the DCT coefficients. The first coefficients

    would be more accurate resulting in closer reconstructed errors after the inverse DCT.

    The second way could be to increase the number of quantized coefficients. The result

    would be of the same kind, a more accurate reconstructed error array. The point is toknow until what point an improve in one way is better than the other, since both

    should be improved to get a perfect file. Other kinds of coding techniques could be

    considered, all these methods will result in a complexity increase but the vocoder is

    simple enough to cope with it.

    If someone wants to look for another way of improving the vocoder, he could look at

    the plain LPC vocoder and try to implement a covariance model. However, since there

    exists no fast algorithm for inverting the covariance matrix, the computational

    complexity can increase tremendously as well as the delay.

    Finally, the excitation parameters, the weakest part in this implementation, could be

    looked at. All the unvoiced sounds cannot result from the same white Gaussian noise

    input. An analysis of this and the creation of a code book for unvoiced sounds could

    give better results. Again, statistical data and numerous simulations are needed.

    7 References

    [1]http://www.data-compression.com/speech.html

    [2]http://www.bell-labs.com

    [3]http://cslu.cse.ogi.edu/HLTsurvey/ch10node4.html

    [4] M. H. Johnson and A. Alwan, "Speech Coding: Fundamentals and Applications",

    to appear as a chapter in the Encyclopedia of Telecommunications, Wiley, December

    2002.

    [5]http://www.ti.com/corp/docs/company/history/pmos.shtml

    [6]http://www-mobile.ecs.soton.ac.uk

    [7] L. R. Rabiner and R. W. Schafer, "Digital Processing of Speech Signals",

    Prentice- Hall, Englewood Cliffs, NJ, 1978.

    http://www.data-compression.com/speech.htmlhttp://www.data-compression.com/speech.htmlhttp://www.data-compression.com/speech.htmlhttp://www.bell-labs.com/http://www.bell-labs.com/http://www.bell-labs.com/http://cslu.cse.ogi.edu/HLTsurvey/ch10node4.htmlhttp://cslu.cse.ogi.edu/HLTsurvey/ch10node4.htmlhttp://cslu.cse.ogi.edu/HLTsurvey/ch10node4.htmlhttp://www.ti.com/corp/docs/company/history/pmos.shtmlhttp://www.ti.com/corp/docs/company/history/pmos.shtmlhttp://www.ti.com/corp/docs/company/history/pmos.shtmlhttp://www-mobile.ecs.soton.ac.uk/http://www-mobile.ecs.soton.ac.uk/http://www-mobile.ecs.soton.ac.uk/http://www-mobile.ecs.soton.ac.uk/http://www.ti.com/corp/docs/company/history/pmos.shtmlhttp://cslu.cse.ogi.edu/HLTsurvey/ch10node4.htmlhttp://www.bell-labs.com/http://www.data-compression.com/speech.html
  • 8/10/2019 Universsity of California at Los Angeles

    20/30

    [8] B. S. Atal, M. R. Schroeder, and V. Stover, "Voice-Excited Predictive Coding

    Systetm for Low Bit-Rate Transmission of Speech", Proc. ICC, pp.30-37 to 30-40,

    1975

    [9] C. J. Weinstein, "A Linear Predictive Vocoder with Voice Excitation", Proc.

    Eascon, September 1975

    [10] Auditory toolbox:http://rvl4.ecn.purdue.edu/~malcolm/interval/1998-010/

    [11] COLEA: Software Tool for Speech

    Analysis:http://www.utdallas.edu/~loizou/speech/colea.htm

    [12] Orsak, G.C. et al. "Collaborative SP education using the Internet and MATLAB"

    IEEE SIGNAL PROCESSING MAGAZINE Nov. 1995. vol.12, no.6, pp.23-32.

    8 Appendix

    The Matlab-code used can be found in this section. The first file is the main file in

    section 8.1, which is executed. It calls other files to code and decode the sentences.

    Two different vocoders are used. The speechcoder1 is the plain LPC vocoder part and

    the speechcoder2 is the voice excited one. The same code in section 8.2 is used to

    generate the LPC parameters for both implementations, however, we just use part of

    them, different inputs, in each vocoder synthesizer. The first synthesizer located in

    section 8.3.2 uses the pitch and the voiced / unvoiced switch information. The second

    synthesizer in section 8.4.2 uses the information generated by the errors compressedusing a DCT, quantized and decoded. These DCT coding and decoding steps are in

    the main speechcoder2.m file, which is in 8.4.1. The LPC generation and the

    synthesizer are based on previously written code [12], somehow modified and adapted

    to our project.

    8.1 Main f i le

    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

    % EE214A - Digital Speech Processing - Class Project, Winter 2002

    %% Speech Coding using Linear Predictive Coding (LPC)

    %

    % Author: Philipp Steurer, 03/04/2002

    %

    % Project team: Daniel Thell, Ozgu Ozun, Philipp Steurer

    %

    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

    http://rvl4.ecn.purdue.edu/~malcolm/interval/1998-010/http://rvl4.ecn.purdue.edu/~malcolm/interval/1998-010/http://rvl4.ecn.purdue.edu/~malcolm/interval/1998-010/http://www.utdallas.edu/~loizou/speech/colea.htmhttp://www.utdallas.edu/~loizou/speech/colea.htmhttp://www.utdallas.edu/~loizou/speech/colea.htmhttp://www.utdallas.edu/~loizou/speech/colea.htmhttp://rvl4.ecn.purdue.edu/~malcolm/interval/1998-010/
  • 8/10/2019 Universsity of California at Los Angeles

    21/30

    clc; % clear the command line

    clear all; % clear the workspace

    %

    % system constants

    % ---------------

    InputFilename = 's2omwb.wav';[inspeech, Fs, bits] = wavread(InputFilename); % read the wavefile

    outspeech1 = speechcoder1(inspeech);

    outspeech2 = speechcoder2(inspeech);

    % display the results

    figure(1);

    subplot(3,1,1);

    plot(inspeech);

    grid;

    subplot(3,1,2);

    plot(outspeech1);grid;

    subplot(3,1,3);

    plot(outspeech2);

    grid;

    disp('Press a key to play the original sound!');

    pause;

    soundsc(inspeech, Fs);

    disp('Press a key to play the LPC compressed sound!');

    pause;

    soundsc(outspeech1, Fs);

    disp('Press a key to play the voice-excited LPC compressed sound!');

    pause;

    soundsc(outspeech2, Fs);

    8.2 LPC outpu t information generat ion

    function [aCoeff,resid,pitch,G,parcor,stream] =

    proclpc(data,sr,L,fr,fs,preemp)

    % USAGE: [aCoeff,resid,pitch,G,parcor,stream] =

    proclpc(data,sr,L,fr,fs,preemp)

    %

    % This function computes the LPC (linear-predictive coding) coefficients that

    % describe a speech signal. The LPC coefficients are a short-time measure of

    % the speech signal which describe the signal as the output of an all-pole

    % filter. This all-pole filter provides a good description of the speech

    % articulators; thus LPC analysis is often used in speech recognition and

    % speech coding systems. The LPC parameters are recalculated, by default in

    % this implementation, every 20ms.

    %

    % The results of LPC analysis are a new representation of the signal

    % s(n) = G e(n) - sum from 1 to L a(i)s(n-i)

  • 8/10/2019 Universsity of California at Los Angeles

    22/30

  • 8/10/2019 Universsity of California at Los Angeles

    23/30

    if (nargin

  • 8/10/2019 Universsity of California at Los Angeles

    24/30

    % response (to check above).

    if 0

    impulseResponse = filter(1, aCoeff(:,nframe), [1 zeros(1,255)]);

    freqResp = 20*log10(abs(fft(impulseResponse)));

    plot(freqResp);

    end

    errSig = filter([1 A'],1,frameData); % find excitation noise

    G(nframe) = sqrt(err(L+1)); % gain

    autoCorErr = xcorr(errSig); % calculate pitch & voicing information

    [B,I] = sort(autoCorErr);

    num = length(I);

    if B(num-1) > .01*B(num)

    pitch(nframe) = abs(I(num) - I(num-1));

    else

    pitch(nframe) = 0;

    end

    % calculate additional info to improve the compressed sound quality

    resid(:,nframe) = errSig/G(nframe);

    if(frameIndex==1) % add residual frames using a trapezoidal window

    stream = resid(1:msfr,nframe);

    else

    stream = [stream;

    overlap+resid(1:msoverlap,nframe).*ramp;

    resid(msoverlap+1:msfr,nframe)];

    end

    if(frameIndex+msfr+msfs-1 > duration)

    stream = [stream; resid(msfr+1:msfs,nframe)];

    else

    overlap = resid(msfr+1:msfs,nframe).*flipud(ramp);

    end

    endstream = filter(1, [1 -preemp], stream)';

    8.3 Plain LPC voco der

    8.3.1 Main file

    function [ outspeech ] = speechcoder( inspeech )

    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

    %

    % EE214A - Digital Speech Processing - Class Project, Winter 2002%

    % Speech Coding using Linear Predictive Coding (LPC)

    % The desired order can be selected in the system constants section.

    % For the excitation impulse-trains are used. The result does not sound very

    % well but with this solution it is possible to achieve a low bitrate!

    %

    % Author: Philipp Steurer, 03/04/2002

    %

  • 8/10/2019 Universsity of California at Los Angeles

    25/30

    % Parameters:

    % inspeech : wave data with sampling rate Fs

    % (Fs can be changed underneath if necessary)

    %

    % Returns:

    % outspeech : wave data with sampling rate Fs

    % (coded and resynthesized)

    %

    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

    %

    % arguments check

    % ---------------

    if ( nargin ~= 1)

    error('argument check failed');

    end;

    %

    % system constants

    % ----------------

    Fs = 16000; % sampling rate in Hertz (Hz)

    Order = 10; % order of the model used by LPC

    %

    % main

    % ----

    % encoded the speech using LPC

    [aCoeff, resid, pitch, G, parcor, stream] = proclpc(inspeech, Fs, Order);

    % decode/synthesize speech using LPC and impulse-trains as excitation

    outspeech = synlpc1(aCoeff, pitch, Fs, G);

    8.3.2 Plain LPC decoder

    function synWave = synlpc(aCoeff,pitch,sr,G,fr,fs,preemp)

    % USAGE: synWave = synlpc(aCoeff,pitch,sr,G,fr,fs,preemp);

    %

    % This function synthesizes a (speech) signal based on a LPC (linear-

    % predictive coding) model of the signal. The LPC coefficients are a

    % short-time measure of the speech signal which describe the signal as the

    % output of an all-pole filter. This all-pole filter provides a good

    % description of the speech articulators; thus LPC analysis is often used in

    % speech recognition and speech coding systems. The LPC analysis is done

    % using the proclpc routine. This routine can be used to verify that the

    % LPC analysis produces the correct answer, or as a synthesis stage after% first modifying the LPC model.

    %

    % The results of LPC analysis are a new representation of the signal

    % s(n) = G e(n) - sum from 1 to L a(i)s(n-i)

    % where s(n) is the original data. a(i) and e(n) are the outputs of the LPC

    % analysis with a(i) representing the LPC model. The e(n) term represents

    % either the speech source's excitation, or the residual: the details of the

    % signal that are not captured by the LPC coefficients. The G factor is a

    % gain term.

  • 8/10/2019 Universsity of California at Los Angeles

    26/30

    %

    % LPC synthesis produces a monaural sound vector (synWave) which is

    % sampled at a sampling rate of "sr". The following parameters are mandatory

    % aCoeff - The LPC analysis results, a(i). One column of L+1 numbers for each

    % frame of data. The number of rows of aCoeff determines L.

    % G - The LPC gain for each frame.

    % pitch - A frame-by-frame estimate of the pitch of the signal, calculated

    % by finding the peak in the residual's autocorrelation for each frame.

    %

    % The following parameters are optional and default to the indicated values.

    % fr - Frame time increment, in ms. The LPC analysis is done starting every

    % fr ms in time. Defaults to 20ms (50 LPC vectors a second)

    % fs - Frame size in ms. The LPC analysis is done by windowing the speech

    % data with a rectangular window that is fs ms long. Defaults to 30ms

    % preemp - This variable is the epsilon in a digital one-zero filter which

    % serves to preemphasize the speech signal and compensate for the 6dB

    % per octave rolloff in the radiation function. Defaults to .9378.

    %

    % This code was graciously provided by:

    % Delores Etter (University of Colorado, Boulder) and

    % Professor Geoffrey Orsak (Southern Methodist University)% It was first published in

    % Orsak, G.C. et al. "Collaborative SP education using the Internet and

    % MATLAB" IEEE SIGNAL PROCESSING MAGAZINE Nov. 1995. vol.12, no.6, pp.

    % 23-32.

    % Modifications by Philipp Steurer:

    % Using impulse-trains for the voice excitation.

    % (c) 1998 Interval Research Corporation

    % A more complete set of routines for LPC analysis can be found at

    % http://www.ee.ic.ac.uk/hp/staff/dmb/voicebox/voicebox.html

    if (nargin < 5), fr = 20; end;

    if (nargin < 6), fs = 30; end;if (nargin < 7), preemp = .9378; end;

    msfs = round(sr*fs/1000); % framesize in samples

    msfr = round(sr*fr/1000); % framerate in samples

    msoverlap = msfs - msfr;

    ramp = [0:1/(msoverlap-1):1]';

    [L1 nframe] = size(aCoeff); % L1 = 1+number of LPC coeffs

    for frameIndex=1:nframe

    A = aCoeff(:,frameIndex);

    % first check if it is voiced or unvoiced sound:

    if ( pitch(frameIndex) ~= 0 )

    t = 0 : 1/sr : fs*10^(-3); % sr sample freq. for fr msd = 0 : 1/pitch(frameIndex) : 1; % 1/pitchfreq. repetition freq.

    residFrame = (pulstran(t, d, 'tripuls', 0.001))'; % sawtooth width of 0.001s

    residFrame = residFrame + 0.01*randn(msfs+1,1);

    else

    residFrame = [];

    for m = 1:msfs

    residFrame = [residFrame; randn];

    end % for

    end;

  • 8/10/2019 Universsity of California at Los Angeles

    27/30

    synFrame = filter(G(frameIndex), A', residFrame); % synthesize speech from

    LPC coeffs

    if(frameIndex==1) % add synthesize frames using a trapezoidal window

    synWave = synFrame(1:msfr);

    else

    synWave = [synWave; overlap+synFrame(1:msoverlap).*ramp; ...

    synFrame(msoverlap+1:msfr)];

    end

    if(frameIndex==nframe)

    synWave = [synWave; synFrame(msfr+1:msfs)];

    else

    overlap = synFrame(msfr+1:msfs).*flipud(ramp);

    end

    end;

    synWave = filter(1, [1 -preemp], synWave);

    8.4 Voice-excited LPC vocoder

    8.4.1 Main File

    function [ outspeech ] = speechcoder( inspeech )

    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

    %

    % EE214A - Digital Speech Processing - Class Project, Winter 2002

    %

    % Speech Coding using Linear Predictive Coding (LPC)

    % The desired order can be selected in the system constants section.

    % For the excitation the residual signal is used. In order to decrease the

    % bitrate, the residual signal is discrete cosine transformed and then

    % compressed. This means only the first 50 coefficients of the DCT are kept.

    % While most of the energy of the signal is stored there, we don't lose a lot

    % of information.%

    % Author: Philipp Steurer, 03/05/2002

    %

    % Parameters:

    % inspeech : wave data with sampling rate Fs

    % (Fs can be changed underneath if necessary)

    %

    % Returns:

    % outspeech : wave data with sampling rate Fs

    % (coded and resynthesized)

    %

    %%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

    %

    % arguments check

    % ---------------

    if ( nargin ~= 1)

    error('argument check failed');

    end;

  • 8/10/2019 Universsity of California at Los Angeles

    28/30

    %

    % system constants

    % ----------------

    Fs = 16000; % sampling rate in Hertz (Hz)

    Order = 10; % order of the model used by LPC

    %

    % main

    % ----

    % encoded the speech using LPC

    [aCoeff, resid, pitch, G, parcor, stream] = proclpc(inspeech, Fs, Order);

    % perform a discrete cosine transform on the residual

    resid = dct(resid);

    [a,b] = size(resid);

    % only use the first 50 DCT-coefficients this can be done

    % because most of the energy of the signal is conserved in these coeffs

    resid = [ resid(1:50,:); zeros(430,b) ];

    % quantize the data

    resid = uencode(resid,4);

    resid = udecode(resid,4);

    % perform an inverse DCT

    resid = idct(resid);

    % add some noise to the signal to make it sound better

    noise = [ zeros(50,b); 0.01*randn(430,b) ];

    resid = resid + noise;

    % decode/synthesize speech using LPC and the compressed residual asexcitation

    outspeech = synlpc2(aCoeff, resid, Fs, G);

    8.4.2 Voice-excited LPC decoder

    function synWave = synlpc(aCoeff,source,sr,G,fr,fs,preemp)

    % USAGE: synWave = synlpc(aCoeff,source,sr,G,fr,fs,preemp);

    %

    % This function synthesizes a (speech) signal based on a LPC (linear-

    % predictive coding) model of the signal. The LPC coefficients are a

    % short-time measure of the speech signal which describe the signal as the

    % output of an all-pole filter. This all-pole filter provides a good

    % description of the speech articulators; thus LPC analysis is often used in

    % speech recognition and speech coding systems. The LPC analysis is done

    % using the proclpc routine. This routine can be used to verify that the

    % LPC analysis produces the correct answer, or as a synthesis stage after

    % first modifying the LPC model.

    %

    % The results of LPC analysis are a new representation of the signal

    % s(n) = G e(n) - sum from 1 to L a(i)s(n-i)

    % where s(n) is the original data. a(i) and e(n) are the outputs of the LPC

  • 8/10/2019 Universsity of California at Los Angeles

    29/30

  • 8/10/2019 Universsity of California at Los Angeles

    30/30

    postFilter = 1; resid = source;

    end

    [row col] = size(resid);

    if col