speech recognition
DESCRIPTION
Speech Recognition. - Ajay Iyer. Outline. What is a Spectrogram? Types of Spectrogram Linguistic and Acoustic Category Prosodic Analysis Pitch Estimation. What is a Spectrogram?. A Spectrogram is a visual representation of an acoustic signal. - PowerPoint PPT PresentationTRANSCRIPT
OutlineWhat is a Spectrogram?Types of SpectrogramLinguistic and Acoustic CategoryProsodic AnalysisPitch Estimation
What is a Spectrogram?A Spectrogram is a visual representation of an
acoustic signal.It displays the degrees of amplitude, frequency and
temporal content of the signal.Depending on the size of the Fourier analysis
window, different resolutions in frequency/time are achieved.
A long analysis window, resolves frequency at the expense of time thereby giving a “Narrowband spectr0gram”.
A short analysis window on the other hand, resolves time at the expense of frequency – hence called a “Wideband spectrogram”.
Linguistic/ Acoustic CategoriesLabeling of the Linguistic and/or Acoustic
categories aids in speeding up the search and decoding algorithms, by discarding the impossible and highly unlikely phoneme combinations.
Implementation : The given phoneme is compared to the different categories according to TIMIT lexicon.
The category thus obtained is displayed along with the phoneme as shown in the following slide.
Prosodic AnalysisAcoustically speaking, prosodies refer to
variation in syllable duration, loudness, pitch and the formant frequencies of the speech signal.
Prosodic features are suprasegmental, i.e they are not restricted to any one segment of speech. They occur in some higher level of an utterance.
Say for example: “No!”, “Don’t!”
PitchOf the various prosodic features, the most
important one is the pitch. Its knowledge enables one to differentiate
between contexts in which a word is spoken viz. Alerting or Referential contexts.
Thus incorporation of pitch information increases the accuracy of the recognizer.
ImplementationThe pitch.m file uses cepstral analysis to
extract pitch information.Pitch.m performs analysis on one analysis
frame segment.Frame based analysis has been coded for
pitch estimation of the entire speech signal.The estimated fundamental frequency (pitch)
is for the instance of time tpitch
= tinterval(frameNum - 1) + fo/Fs;
ReferencesProsodic_Modeling_for_Improved_Speech_Recogntion_and_
Understanding_Wang_phd_thesis.pdf Prosodic Analysis of Alerting and Referential Context of
Sentinel Words_final_draft.pdf Discrimination_of_Sentinel_Word_Contexts_using_Prosodic
_Features_Journal_v1.pdf http://home.cc.umanitoba.ca/~robh/howto.htmlhttp://en.wikipedia.org/wiki/Prosody_(linguistics)