speech recognition uthm

8/14/2019 Speech Recognition UTHM

1/30

Design of Automatic Speech RecognitionSystems for Voice Identification and

Analysis

Supervisor: Dr. Mohammed Faiz Liew bin AbdullahDr. Rozlan bin Alias

En.Ariffuddin bin Joret

DINESHWARAN GUNALAN HE120104TUAN MUSTAQIM HE120105

UNIVERSITY TUN HUSSEIN ONN (UTHM)


2/30


3/30

Advantages and Applications

Dictation

Command and control

Telephony

Medical/disabilities


4/30

How the Brain Work

Articulation produces

sound waves which the ear conveys to the brain

for processing


5/30

How the Computer Work

Digitization

Acoustic analysis of thespeech signal

Linguistic interpretation

coustic waveform Acoustic signal

Speech recognition


6/30

Challenges Faced

Ease of use

Robust performance

Automatic learning of new words and sounds

Grammar for spoken language

Control of synthesized voice quality

Integrated learning for speech recognition and synthesis

Getting a computer to understand spoken language

Digitization

Converting analogue signal into digital representation

Signal processing

Separating speech from background noise

Phonetics

Variability in human speech

Phonology

Recognizing individual sound distinctions (similar phonemes)


7/30

Digitization

Analogue to digital conversion Sampling and quantizing

Use filters to measure energy levels forvarious points on the frequency spectrum

Knowing the relative importance of differentfrequency bands (for speech) makes thisprocess more efficient

E.g. high frequency sounds are lessinformative, so can be sampled using abroader bandwidth (log scale)


8/30

Separating Speech from Background Noise

Noise cancelling microphones

Two mics, one facing speaker, the other facingaway

Ambient noise is roughly same for both mics Knowing which bits of the signal relate to speech

Spectrograph analysis


9/30

Variability in Individuals Speech

Variation among speakers due toVocal range (f0, and pitch rangesee later)

Voice quality (growl, whisper, physiologicalelements such as nasality, adenoidality, etc)

ACCENT !!! (especially vowel systems, but alsoconsonants, allophones, etc.)

Variation within speakers due to

Health, emotional stateAmbient conditions

Speech style: formal read vs spontaneous


10/30

Identifying Phonemes

Differences between some phonemesare sometimes very small

May be reflected in speech signal (eg

vowels have more or less distinctive f1 andf2)

Often show up in coarticulation effects(transition to next sound)

e.g. aspiration of voiceless stops in English

Allophonic variation


11/30

Disambiguating Homophones

Mostly differences are recognised by humansby context and need to make sense

Its hard to wreck a nice beach

What dimes a necks drain to stop port?

Systems can only recognize words that are intheir lexicon, so limiting the lexicon is anobvious ploy

Some ASR systems include a grammar whichcan help disambiguation


12/30

Interpreting Prosodic Features

Pitch, length and loudness are used toindicate stress

All of these are relative

On a speaker-by-speaker basisAnd in relation to context

Pitch and length are phonemic in some

languages


13/30

Pitch

Pitch contour can be extracted fromspeech signal

But pitch differences are relative

One mans high is another (wo)mans low Pitch range is variable

Pitch contributes to intonation

But has other functions in tone languages Intonation can convey meaning


14/30

Length

Length is easy to measure but difficult tointerpret

Again, length is relative

It is phonemic in many languages Speech rate is not constantslows down at the

end of a sentence


15/30

Loudness

Loudness is easy to measure butdifficult to interpret

Again, loudness is relative


16/30

Objectives of our Project

Design and implement an automatic voicerecognition system.

Design a GUI system.

Design an Interface circuit to blink LED

MATLAB program is used to implement theproject.


17/30

Speech Recognition

Main pr inciple

of Speech

Recognition

Identification Verification


18/30

Hardware


19/30

GUI Software


20/30

Input Speech Signals

There are silence at the beginning and at theend of the signals.

The word consists of two syllables.

ze ro


21/30

Frame Blocking

FrameBlocking Windowing FFT

Final Output

Continuous

speech

frame spectrum

Continuous speech signal is blocked into frames of Nsamples with adjacent frames being separated by M(M


22/30

Frame Blocking


Final Output

Continuous

speech

frame spectrum

Continuous speech signal is blocked into frames of Nsamples with adjacent frames being separated by M(M


23/30

After Frame Blocking

The speech signals were blocked into framesof N samples with overlap.


24/30

Windowing


Final Output

Continuous

speech

frame spectrum

Each individual frame will be windowed.

Hamming window is used in this project.


25/30

After Windowing


26/30


27/30

After Windowing


Final Output

Continuous

speech

frame spectrum

Convert each frame of N samples form timedomain into the frequency domain.


28/30


29/30

Spectrum


30/30

Thank You

For

Your Time

speech recognition uthm

Documents