speech recognition seminar

1

SPEECH RECOGNITION

07-Feb-2013

Seminar By: Suraj Vitthal GaikwadGuided By: Prof. S. R. Lahane

2

Outline

Introduction Speech Recognition Process Types Of Speech Recognition Systems Algorithms Applications Advantages & Disadvantages Future Scope Conclusion

07-Feb-13SPEECH RECOGNITION

3

Introduction

Speech recognition is the process by which a computer (or any other type of machine) identifies spoken words.

Basically, it means talking to your computer, AND having it correctly understand what you are saying.

An alternative to traditional methods of interacting with a computer.


4


5

Speech Recognition Process

07-Feb-13

Signal Processing Convert the audio wave into a sequence of feature

vectors Speech Recognition

Decode the sequence of feature vectors into a sequence of words

Semantic Interpretation Determine the meaning of the recognized words

Dialog Management Correct the errors and help get the task done

Response Generation What words to use so as to maximize user

understanding Speech Synthesis (Text to Speech)

Generate synthetic speech from a ‘marked-up’ word string

SPEECH RECOGNITION

6

Typical Speech Recognition Process


7

Types of Speech Recognition

07-Feb-13

Isolated Words Single utterance at a time

Connected Words Separate utterances together with a

minimal pause between them Continuous Speech

Rehearsed speech or dictation Spontaneous Speech

Natural speechSPEECH RECOGNITION

8

Algorithms

07-Feb-13

Dynamic Time Warpingan algorithm for measuring similarity

between two sequences which may vary in time or speed.

Hidden Markov Models Neural Networks

SPEECH RECOGNITION

9

Hidden Markov Model

07-Feb-13

In a HMM, the state is not directly visible, but output, dependent on the state, is visible.

Each state has a probability distribution over the possible output tokens. Therefore the sequence of tokens generated by an HMM gives some information about the sequence of states.

x — statesy — possible observationsa — state transition probabilitiesb — output probabilities

SPEECH RECOGNITION

10

HMM Example


11

Neural Network

07-Feb-13

A neural network consists of an interconnected group of artificial neurons, and it processes information using a connectionist approach to computation.

An NN is typically defined by three types of parameters: The interconnection pattern between different

layers of neurons The learning process for updating the weights of the

interconnections The activation function that converts a neuron's

weighted input to its output activation.SPEECH RECOGNITION

12

Speech Recognition Softwares

07-Feb-13

Open source Julius

Macintosh Dragon Dictate

Mobile Devices/ Smartphone Google Now Siri Micromax AISHA

(Artificial Intelligence Speech Handset Assistant) S Voice Iris (Intelligent Rival Imitator of Siri)

Windows Dragon NaturallySpeaking Windows Speech Recognition

SPEECH RECOGNITION

13

Applications

07-Feb-13

Games and Edutainment Data Entry Document Editing Speaker Identification/Verification Automation at Call Centers Medical/Disabilities Fighter Aircrafts

SPEECH RECOGNITION

14

Advantages


Increases Productivity Can help with menial computer tasks Can help people with disabilities Cost Effective Diminishes Spelling Mistakes

15

Disadvantages

07-Feb-13

Inaccuracy & Slowness Vocal Strain Adaptability Out-of-Vocabulary (OOV) Words Spontaneous Speech. Etc Accent, Dialect and Mixed Language

SPEECH RECOGNITION

SPEECH RECOGNITION

16

Future Scope

07-Feb-13

Achieving efficient speaker independent word recognition

SRS may have the ability to distinguish nuances of speech and meanings of words.

Stand alone Speech Recognition Systems.

Wearable Speech Recognition System. Talk with all the devices.

17

Conclusion

07-Feb-13

Within five years, speech recognition technology will become so pervasive in our daily lives that service environments lacking this technology will be considered inferior.

Speech recognition will revolutionize the way people interacted with Smart devices & will, ultimately, differentiate the upcoming technologies.

SPEECH RECOGNITION

18

References


JOE TEBELSKIS {1995}, SPEECH RECOGNITION USING NEURAL NETWORKS, School of Computer Science, Carnegie Mellon University

KÅRE SJÖLANDER {2003}, An HMM-based system for automatic segmentation and alignment of speech, Umeå University, Department of Philosophy and Linguistics

KLAUS RIES {1999}, HMM AND NEURAL NETWORK BASED SPEECH ACT DETECTION, International Conference on Acoustics and Signal Processing (ICASSP’99)

B. PLANNERER {2005}, AN INTRODUCTION TO SPEECH RECOGNITION KIMBERLEE A. KEMBLE, AN INTRODUCTION TO SPEECH RECOGNITION,

Voice Systems Middleware Education, IBM LAURA SCHINDLER {2005}, A SPEECH RECOGNITION AND SYNTHESIS

TOOL, Department of Mathematics and Computer Science, College of Arts and Science, Stetson University

MIKAEL NILSSON, MARCUS EGNARSSON {2002}, SPEECH RECOGNITION USING HMM, Blekinge Institute Of technology

19


THANK YOU…!!

ANY QUESTIONS…??

speech recognition seminar

Documents

speech recognition

signal processing

computer science

words

speech

computer

time

state