06.09.2005prof. pushpak bhattacharyya, iit bombay.1 application of noisy channel, channel entropy cs...

23
06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 1 Application of Noisy Channel, Channel Entropy CS 621 Artificial Intelligence Lecture 15 - 06/09/05 Prof. Pushpak Bhattacharyya

Upload: warren-brooks

Post on 31-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 1

Application of Noisy Channel, Channel Entropy

CS 621 Artificial Intelligence

Lecture 15 - 06/09/05

Prof. Pushpak Bhattacharyya

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 2

S = {s1 , s2 … sq} R = {t1 , t2 … tq}

Noisy Channel

S R

SPEECH RECOGNITION

( ASR – Automatic SR)

- Signal processing (low level).

- Cognitive Processing (higher level categories).

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 3

Noisy Channel Metaphor

Due to Jelinek (IBM) – 1970’s

Main field of study – speech.

Problem Definition

S = {Speech signals}

= {s1 , s2 … ss}

R = {w1 , w2 … wr}

{s1 , s2 … sp} {w1 , w2 … wq}

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 4

Special and Easier case

Isolated word Recognition (IWR)

Complexity due to ‘Word Boundary’ will not arise.

Example : I got a plate

vs

I got up late

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 5

Homophones: Words have same pronunciation.

Example: bear, beer :

Homographs: Words have same spellings but different meaning

Example: bank; River bank and finance bank

Homophones And Homographs

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 6

World of sounds – speech signals

Phonetics Phonology

World of words Orthography

letters : Consonants

Vowels

World Of Sounds

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 7

Since alphabet to sound mapping is not one to one

Vowels

Tomato

Tomaeto Tomaato

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 8

Sound VariationsLexical variations

‘because’

‘cause because

Allophonic variations

‘because’

because becase

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 9

Allophonic variations: More remarkable example

Do [ δ][U]

Go [G][0]

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 10

Socio-cultural variationssomething

something somethin

formal informal

Dialectic variation Very – bheri in Bengal

apple – ieple in south eple in north aapel in bengal

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 11

Orthography -- Phonology

complex problem

Very difficult to model using ‘Rule Governed’ system.

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 12

Probabilistic ApproachW* = Best estimate for a word given S

N C

S W*

W* = ARGMAX [ P(w|s) ]

w belongs to set of words

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 13

P(w|s) called the ‘parameter’ of the system.

Estimation Training

The probability values need to be estimated from

“SPEECH CORPORA”.

Record speech of many speakers.

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 14

Look of Speech Corpora

Annotation – Unique pronunciation.

Signal

Apple

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 15

Repository of Standard Sound Symbols

IPA – International Phonetic Association.

ARPABET – American’s Phonetic STD.

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 16

t

Augment the Roman Alphabet with Greek symbols

e [Є] ‘ebb’

[i] ‘need’

top [ t] IPA

tool [θ] IPA

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 17

Speech corpora are annotated with IPA/ARPABET symbols.

Indian Scenario

Hindi TIFR

Marathi IITB

Tamil IITM

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 18

How to Estimate P(w|s) from speech corpora

count(w,s)/ count(s)Not done this way

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 19

Apply Bayes Theorem

P(w|s) = P(w). P(s|w) / P(s)

W* = ARGMAX (P(w). P(s|w)) / P(s)

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 20

W* =ARGMAX (P(w). P(s|w))

w belongs to Words

P(w) = Prior = Language model.

P(s|w) = Likelihood of W being pronounced as ‘s’.

= Acoustic Model.

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 21

Acoustic Model

Pronunciation dictionary (Finite State Automata).

Manually Built - Costly Resource.

Example

s 1 2 3

4

56 0

t 0 maa t

ae0

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 22

W* obtained from P(w) and P(w|s)

Language model ?

Rel. frequency of w in the corpora

Ref freq Ξ unigram model

P(knee) > P(need)

I _ _ _ _ _

Knee High probability

need Low probability

06.09.2005 Prof. Pushpak Bhattacharyya, IIT Bombay. 23

Language Modelling by

N-grams

N – grams

N:

2 – bigrams.

3 – trigrams (Best empirically for English).