Download - Speech Interfaces User Interfaces Spring 1998 Drew Roselli

Speech Interfaces

User Interfaces Spring 1998

Drew Roselli

Motivation: Mechanical

• Smaller devices => difficult I/O

• Speed, > 90 wpm (?)

• “Virtually unlimited” set of commands

• Freedom for other body parts

Motivation: User

• Natural

• Easy to remember

• Evolutionarily selected for– reading and writing are not– neither is typing

Speech Background

• Speech is faster than vocal apparatus

» nasals spread

• Phonetic rules provide redundancy

» taboo combinations, SR in Srini

» contextual pronunciation:

/t/ -> aspirated, flap, unreleased

Speech Recognition

• Often misunderstood by people» continuous feedback

• Longer words are easier

• Maximally different vowels: a, i, u

• Individual training» gender-based» “meaningless” conversation openers

Speech Production

• Three formants visible on oscilloscope

• Harmonics from larynx, throat, mouth

• Two needed for recognition but “tinny”

• 1989 demo– http://cahn.www.media.mit.edu/people/cahn/

emot-speech.html

More Gratuitous Opinions(I’m really talking out of my butt here.)

• Recently a visual culture

• TV generation require pictured textbooks

• Notes mean “I’ll learn it later”

• Oral tradition has strong history– http://www.missouri.edu/~csottime/index.html

Could we go verbal?

Recognition Problems• Poor recognition

– humans < 1% error rate on dictation– Janus 7% error rate (how much context?)– Janus 20% in real time

• Background noise

• Slow – (simple matter of hardware)

• Homonym-rich languages (Cantonese)

More Recognition Problems

• Isolated, short words difficult– common words become short

• Segmentation– silly versus sill lea

• No semantic help

• Spelling– interface with printer, mail

UI Problems: Navigation

• Aural no-nos– modes– deep hierarchies

• Speech analog

• Grammar = how to re-structure linear sequence of words

Is there a UI equivalent?

UI Problems: Feedback

• Verbose feedback wastes time/patience– only confirm consequential things– use meaningful, short cues

• Interruption– half-duplex communication– real-time scheduling

UI Problems: Meaning

• “Do what I mean not what I say”

• Silence means “Do the right thing”

VoiceNotes

• Voice-based file system

• Replacement for tapes

• “Hierarchical” access to voice data

• Thorough documentation of problems

SpeechActs

• Speech interface to computer tools– email, calendar, weather, stock quotes

• Conversions to canonical form– keyword based? confused by negations?

• Inconsistent recognition – misunderstand system– progressive assistance– implicit confirmation

Multimodal Error Correction

• Dictation error correction study

• Results very unclear

• Recognizer got it wrong the first time

=> will get it wrong the second time

hyperarticulating aggravates

• Correct dictation errors with:

vocal spelling, writing, typing, etc

Download - Speech Interfaces User Interfaces Spring 1998 Drew Roselli

Top Related