speech interfaces user interfaces spring 1998 drew roselli

15
Speech Interfaces User Interfaces Spring 1998 Drew Roselli

Upload: robyn-porter

Post on 13-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Speech Interfaces User Interfaces Spring 1998 Drew Roselli

Speech Interfaces

User Interfaces Spring 1998

Drew Roselli

Page 2: Speech Interfaces User Interfaces Spring 1998 Drew Roselli

Motivation: Mechanical

• Smaller devices => difficult I/O

• Speed, > 90 wpm (?)

• “Virtually unlimited” set of commands

• Freedom for other body parts

Page 3: Speech Interfaces User Interfaces Spring 1998 Drew Roselli

Motivation: User

• Natural

• Easy to remember

• Evolutionarily selected for– reading and writing are not– neither is typing

Page 4: Speech Interfaces User Interfaces Spring 1998 Drew Roselli

Speech Background

• Speech is faster than vocal apparatus

» nasals spread

• Phonetic rules provide redundancy

» taboo combinations, SR in Srini

» contextual pronunciation:

/t/ -> aspirated, flap, unreleased

Page 5: Speech Interfaces User Interfaces Spring 1998 Drew Roselli

Speech Recognition

• Often misunderstood by people» continuous feedback

• Longer words are easier

• Maximally different vowels: a, i, u

• Individual training» gender-based» “meaningless” conversation openers

Page 6: Speech Interfaces User Interfaces Spring 1998 Drew Roselli

Speech Production

• Three formants visible on oscilloscope

• Harmonics from larynx, throat, mouth

• Two needed for recognition but “tinny”

• 1989 demo– http://cahn.www.media.mit.edu/people/cahn/

emot-speech.html

Page 7: Speech Interfaces User Interfaces Spring 1998 Drew Roselli

More Gratuitous Opinions(I’m really talking out of my butt here.)

• Recently a visual culture

• TV generation require pictured textbooks

• Notes mean “I’ll learn it later”

• Oral tradition has strong history– http://www.missouri.edu/~csottime/index.html

Could we go verbal?

Page 8: Speech Interfaces User Interfaces Spring 1998 Drew Roselli

Recognition Problems• Poor recognition

– humans < 1% error rate on dictation– Janus 7% error rate (how much context?)– Janus 20% in real time

• Background noise

• Slow – (simple matter of hardware)

• Homonym-rich languages (Cantonese)

Page 9: Speech Interfaces User Interfaces Spring 1998 Drew Roselli

More Recognition Problems

• Isolated, short words difficult– common words become short

• Segmentation– silly versus sill lea

• No semantic help

• Spelling– interface with printer, mail

Page 10: Speech Interfaces User Interfaces Spring 1998 Drew Roselli

UI Problems: Navigation

• Aural no-nos– modes– deep hierarchies

• Speech analog

• Grammar = how to re-structure linear sequence of words

Is there a UI equivalent?

Page 11: Speech Interfaces User Interfaces Spring 1998 Drew Roselli

UI Problems: Feedback

• Verbose feedback wastes time/patience– only confirm consequential things– use meaningful, short cues

• Interruption– half-duplex communication– real-time scheduling

Page 12: Speech Interfaces User Interfaces Spring 1998 Drew Roselli

UI Problems: Meaning

• “Do what I mean not what I say”

• Silence means “Do the right thing”

Page 13: Speech Interfaces User Interfaces Spring 1998 Drew Roselli

VoiceNotes

• Voice-based file system

• Replacement for tapes

• “Hierarchical” access to voice data

• Thorough documentation of problems

Page 14: Speech Interfaces User Interfaces Spring 1998 Drew Roselli

SpeechActs

• Speech interface to computer tools– email, calendar, weather, stock quotes

• Conversions to canonical form– keyword based? confused by negations?

• Inconsistent recognition – misunderstand system– progressive assistance– implicit confirmation

Page 15: Speech Interfaces User Interfaces Spring 1998 Drew Roselli

Multimodal Error Correction

• Dictation error correction study

• Results very unclear

• Recognizer got it wrong the first time

=> will get it wrong the second time

hyperarticulating aggravates

• Correct dictation errors with:

vocal spelling, writing, typing, etc