by sarita jondhale1 pattern comparison techniques

By Sarita Jondhale 1

Pattern Comparison Techniques


Pattern Comparison Techniques The output of the front end spectral analysis is

in the form of vectors. The test pattern T is the set containing many

vectors. The reference pattern R is the set containing

many vectors. The goal of pattern comparison stage is to

determine the dissimilarity of each vector in T to each vector of R

The reference pattern should be such that there should be minimum dissimilarity


Pattern Comparison Techniques

To determine the global similarity of T and R we will consider the following problems: T and R generally are of unequal length w.r.t.

time duration due to different speaking rates across different talkers

T and R need not line up in time in any simple or well prescribed manner this is because different sounds cannot be varied in duration to same degree. Vowels are easily lengthened or shortened but consonants cannot change in duration

We need a way to compare a spectral vectors


Speech Detection Also called as End point detection The goal of speech detection is to separate

speech signal with a background signal. The need of speech detection occurs in many

applications in telecommunications For automatic speech recognition, end point

detection is required to isolate the speech of interest so as to be able to create a speech pattern or template.


Speech Detection

Speech must be detected so as to provide the best patterns for the recognition

Best patterns means which provides highest recognition accuracy


Speech Detection

Accurate detection of speech is a simple problem when speech is produced in a relatively noise free environment

It becomes difficult task when the environment is noisy


Speech Detection First factor: during speech, talker produces

sound like lip smacks, heavy breathing and mouth clicks

Mouth click with speaking: The mouth click is produced by opening the lips prior to speaking or after speaking, the noise of clicking is separate from the speech signal and the energy level is comparable to speech energy signal


Speech Detection Heavy breathing with speaking: unlike the

mouth click the heavy breathing noise is not separated from the speech and therefore makes accurate end point detection quite difficult


Speech Detection

Second factor: environmental noise The ideal environment for talking is the quite

room with no acoustic noise signal generators other than that produced by the speaker.


Speech Detection Ideal environment is not possible

practically Have to consider speech produced

In noisy backgrounds (fans, machinery) In non stationary environments (presence of

door slams, irregular road noise, car horns) With speech interference ( as from TV, radio,

or background conversations) And in hostile circumstances ( when the

speaker is stressed)


Speech Detection

These interfering signals are some what like speech signals therefore accurate end point detection become difficult


Speech Detection

Third factor: distortion introduced by the transmission system over which speech signal is sent.


Speech Detection The methods for speech detection is

broadly classified into three approaches The explicit approach The implicit approach The hybrid approach


The explicit approach


The explicit approach The speech signal is first measured and

feature measurement is made The speech detection method is then

applied to locate and define the speech events

The detected speech is sent to the pattern comparison algorithm, and finally the decision mechanism chooses the recognized word


The explicit approach

For signals with a stationary and low level noise background, the approach produces reasonably good detection accuracy

The approach fails often when the environment is noisy or the interference in non stationary


The implicit approach


The implicit approach This approach detects the speech

detection problem simultaneously with the pattern matching and recognition-decision process

It recognizes that the speech events are almost always accompanied by a certain acoustic background


The implicit approach The unmarked signal sequence is

processed by the pattern matching module in which all possible end points sets are considered

The decision mechanism provides ordered list of the candidate words as well as corresponding speech locations

The final result is best candidate and its associated end points.



Depending on the word recognized the boundary locations could inherently be different with the implicit method (feedback)

With explicit method only a single choice of boundary locations is made



Advantages & disadvantages Requires heavy computations But offers higher detection accuracy than

the explicit approach


The hybrid approach This is the combination of both implicit and

explicit approaches Uses the explicit method to obtain several

end points sets for recognition processing and implicit method to choose the alternatives

The most likely candidate word and the corresponding end points as in implicit approach, are provided by the decision box.


The hybrid approach


The hybrid approach

Computational load is equivalent to explicit method

And accuracy comparable to implicit method

by sarita jondhale1 pattern comparison techniques

Documents

speech pattern

speech signals

need of speech detection

accurate end point detection

reference pattern r

test pattern t

background signal

r need