speech recognition uthm

Upload: dineshwaran-daniel-gunalan

Post on 04-Jun-2018

226 views

Category:

Documents


0 download

TRANSCRIPT

  • 8/14/2019 Speech Recognition UTHM

    1/30

    Design of Automatic Speech RecognitionSystems for Voice Identification and

    Analysis

    Supervisor: Dr. Mohammed Faiz Liew bin AbdullahDr. Rozlan bin Alias

    En.Ariffuddin bin Joret

    DINESHWARAN GUNALAN HE120104TUAN MUSTAQIM HE120105

    UNIVERSITY TUN HUSSEIN ONN (UTHM)

  • 8/14/2019 Speech Recognition UTHM

    2/30

  • 8/14/2019 Speech Recognition UTHM

    3/30

    Advantages and Applications

    Dictation

    Command and control

    Telephony

    Medical/disabilities

  • 8/14/2019 Speech Recognition UTHM

    4/30

    How the Brain Work

    Articulation produces

    sound waves which the ear conveys to the brain

    for processing

  • 8/14/2019 Speech Recognition UTHM

    5/30

    How the Computer Work

    Digitization

    Acoustic analysis of thespeech signal

    Linguistic interpretation

    coustic waveform Acoustic signal

    Speech recognition

  • 8/14/2019 Speech Recognition UTHM

    6/30

    Challenges Faced

    Ease of use

    Robust performance

    Automatic learning of new words and sounds

    Grammar for spoken language

    Control of synthesized voice quality

    Integrated learning for speech recognition and synthesis

    Getting a computer to understand spoken language

    Digitization

    Converting analogue signal into digital representation

    Signal processing

    Separating speech from background noise

    Phonetics

    Variability in human speech

    Phonology

    Recognizing individual sound distinctions (similar phonemes)

  • 8/14/2019 Speech Recognition UTHM

    7/30

    Digitization

    Analogue to digital conversion Sampling and quantizing

    Use filters to measure energy levels forvarious points on the frequency spectrum

    Knowing the relative importance of differentfrequency bands (for speech) makes thisprocess more efficient

    E.g. high frequency sounds are lessinformative, so can be sampled using abroader bandwidth (log scale)

  • 8/14/2019 Speech Recognition UTHM

    8/30

    Separating Speech from Background Noise

    Noise cancelling microphones

    Two mics, one facing speaker, the other facingaway

    Ambient noise is roughly same for both mics Knowing which bits of the signal relate to speech

    Spectrograph analysis

  • 8/14/2019 Speech Recognition UTHM

    9/30

    Variability in Individuals Speech

    Variation among speakers due toVocal range (f0, and pitch rangesee later)

    Voice quality (growl, whisper, physiologicalelements such as nasality, adenoidality, etc)

    ACCENT !!! (especially vowel systems, but alsoconsonants, allophones, etc.)

    Variation within speakers due to

    Health, emotional stateAmbient conditions

    Speech style: formal read vs spontaneous

  • 8/14/2019 Speech Recognition UTHM

    10/30

    Identifying Phonemes

    Differences between some phonemesare sometimes very small

    May be reflected in speech signal (eg

    vowels have more or less distinctive f1 andf2)

    Often show up in coarticulation effects(transition to next sound)

    e.g. aspiration of voiceless stops in English

    Allophonic variation

  • 8/14/2019 Speech Recognition UTHM

    11/30

    Disambiguating Homophones

    Mostly differences are recognised by humansby context and need to make sense

    Its hard to wreck a nice beach

    What dimes a necks drain to stop port?

    Systems can only recognize words that are intheir lexicon, so limiting the lexicon is anobvious ploy

    Some ASR systems include a grammar whichcan help disambiguation

  • 8/14/2019 Speech Recognition UTHM

    12/30

    Interpreting Prosodic Features

    Pitch, length and loudness are used toindicate stress

    All of these are relative

    On a speaker-by-speaker basisAnd in relation to context

    Pitch and length are phonemic in some

    languages

  • 8/14/2019 Speech Recognition UTHM

    13/30

    Pitch

    Pitch contour can be extracted fromspeech signal

    But pitch differences are relative

    One mans high is another (wo)mans low Pitch range is variable

    Pitch contributes to intonation

    But has other functions in tone languages Intonation can convey meaning

  • 8/14/2019 Speech Recognition UTHM

    14/30

    Length

    Length is easy to measure but difficult tointerpret

    Again, length is relative

    It is phonemic in many languages Speech rate is not constantslows down at the

    end of a sentence

  • 8/14/2019 Speech Recognition UTHM

    15/30

    Loudness

    Loudness is easy to measure butdifficult to interpret

    Again, loudness is relative

  • 8/14/2019 Speech Recognition UTHM

    16/30

    Objectives of our Project

    Design and implement an automatic voicerecognition system.

    Design a GUI system.

    Design an Interface circuit to blink LED

    MATLAB program is used to implement theproject.

  • 8/14/2019 Speech Recognition UTHM

    17/30

    Speech Recognition

    Main pr inciple

    of Speech

    Recognition

    Identification Verification

  • 8/14/2019 Speech Recognition UTHM

    18/30

    Hardware

  • 8/14/2019 Speech Recognition UTHM

    19/30

    GUI Software

  • 8/14/2019 Speech Recognition UTHM

    20/30

    Input Speech Signals

    There are silence at the beginning and at theend of the signals.

    The word consists of two syllables.

    ze ro

  • 8/14/2019 Speech Recognition UTHM

    21/30

    Frame Blocking

    FrameBlocking Windowing FFT

    Final Output

    Continuous

    speech

    frame spectrum

    Continuous speech signal is blocked into frames of Nsamples with adjacent frames being separated by M(M

  • 8/14/2019 Speech Recognition UTHM

    22/30

    Frame Blocking

    FrameBlocking Windowing FFT

    Final Output

    Continuous

    speech

    frame spectrum

    Continuous speech signal is blocked into frames of Nsamples with adjacent frames being separated by M(M

  • 8/14/2019 Speech Recognition UTHM

    23/30

    After Frame Blocking

    The speech signals were blocked into framesof N samples with overlap.

  • 8/14/2019 Speech Recognition UTHM

    24/30

    Windowing

    FrameBlocking Windowing FFT

    Final Output

    Continuous

    speech

    frame spectrum

    Each individual frame will be windowed.

    Hamming window is used in this project.

  • 8/14/2019 Speech Recognition UTHM

    25/30

    After Windowing

  • 8/14/2019 Speech Recognition UTHM

    26/30

  • 8/14/2019 Speech Recognition UTHM

    27/30

    After Windowing

    FrameBlocking Windowing FFT

    Final Output

    Continuous

    speech

    frame spectrum

    Convert each frame of N samples form timedomain into the frequency domain.

  • 8/14/2019 Speech Recognition UTHM

    28/30

  • 8/14/2019 Speech Recognition UTHM

    29/30

    Spectrum

  • 8/14/2019 Speech Recognition UTHM

    30/30

    Thank You

    For

    Your Time