characterizing and processing robot-directed speech
DESCRIPTION
Characterizing and Processing Robot-Directed Speech. Paulina Varchavskaia • Paul Fitzpatrick • Cynthia Breazeal MIT AI Lab Humanoid Robotics Group. Characterizing and Processing Robot-Directed Speech. Paulina Varchavskaia • Paul Fitzpatrick • Cynthia Breazeal - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Characterizing and Processing Robot-Directed Speech](https://reader035.vdocuments.mx/reader035/viewer/2022062517/56812d82550346895d9292ae/html5/thumbnails/1.jpg)
Characterizing and Processing
Robot-Directed Speech
Paulina Varchavskaia • Paul Fitzpatrick • Cynthia Breazeal MIT AI Lab Humanoid Robotics Group
![Page 2: Characterizing and Processing Robot-Directed Speech](https://reader035.vdocuments.mx/reader035/viewer/2022062517/56812d82550346895d9292ae/html5/thumbnails/2.jpg)
Characterizing and Processing
Robot-Directed Speech
Paulina Varchavskaia • Paul Fitzpatrick • Cynthia Breazeal MIT AI Lab Humanoid Robotics Group
![Page 3: Characterizing and Processing Robot-Directed Speech](https://reader035.vdocuments.mx/reader035/viewer/2022062517/56812d82550346895d9292ae/html5/thumbnails/3.jpg)
Characterizing and Processing
Robot-Directed Speech
Paulina Varchavskaia • Paul Fitzpatrick • Cynthia Breazeal MIT AI Lab Humanoid Robotics Group
![Page 4: Characterizing and Processing Robot-Directed Speech](https://reader035.vdocuments.mx/reader035/viewer/2022062517/56812d82550346895d9292ae/html5/thumbnails/4.jpg)
Characterizing and Processing
Robot-Directed Speech
Paulina Varchavskaia • Paul Fitzpatrick • Cynthia Breazeal MIT AI Lab Humanoid Robotics Group
![Page 5: Characterizing and Processing Robot-Directed Speech](https://reader035.vdocuments.mx/reader035/viewer/2022062517/56812d82550346895d9292ae/html5/thumbnails/5.jpg)
![Page 6: Characterizing and Processing Robot-Directed Speech](https://reader035.vdocuments.mx/reader035/viewer/2022062517/56812d82550346895d9292ae/html5/thumbnails/6.jpg)
![Page 7: Characterizing and Processing Robot-Directed Speech](https://reader035.vdocuments.mx/reader035/viewer/2022062517/56812d82550346895d9292ae/html5/thumbnails/7.jpg)
![Page 8: Characterizing and Processing Robot-Directed Speech](https://reader035.vdocuments.mx/reader035/viewer/2022062517/56812d82550346895d9292ae/html5/thumbnails/8.jpg)
![Page 9: Characterizing and Processing Robot-Directed Speech](https://reader035.vdocuments.mx/reader035/viewer/2022062517/56812d82550346895d9292ae/html5/thumbnails/9.jpg)
Speech Recognition
What speech does the robot evoke?
How can we process that speech?
– Support a growing vocabulary
– Without hurting recognition accuracy
![Page 10: Characterizing and Processing Robot-Directed Speech](https://reader035.vdocuments.mx/reader035/viewer/2022062517/56812d82550346895d9292ae/html5/thumbnails/10.jpg)
Baseline System
Extending Vocabulary
Extending Vocabulary
![Page 11: Characterizing and Processing Robot-Directed Speech](https://reader035.vdocuments.mx/reader035/viewer/2022062517/56812d82550346895d9292ae/html5/thumbnails/11.jpg)
Baseline System
“Magic word” to introduce vocabulary
Separation of introduction from use
Robot confused by unknown/filler words
![Page 12: Characterizing and Processing Robot-Directed Speech](https://reader035.vdocuments.mx/reader035/viewer/2022062517/56812d82550346895d9292ae/html5/thumbnails/12.jpg)
Study: Characterizing Speech
13 children
5-10 years old
20 minute sessions
Some prompting (Turkle et al)
![Page 13: Characterizing and Processing Robot-Directed Speech](https://reader035.vdocuments.mx/reader035/viewer/2022062517/56812d82550346895d9292ae/html5/thumbnails/13.jpg)
![Page 14: Characterizing and Processing Robot-Directed Speech](https://reader035.vdocuments.mx/reader035/viewer/2022062517/56812d82550346895d9292ae/html5/thumbnails/14.jpg)
0 10 20 30 40 50 60 70 80 90 1000
1
2
3
4
5
6
7
8
9
10
children (cumulative %)
nu
mb
er o
f u
tter
ance
s p
er m
inu
te (approx)
Number of Utterances10 20 30 40 50 60 70 80 90 100
![Page 15: Characterizing and Processing Robot-Directed Speech](https://reader035.vdocuments.mx/reader035/viewer/2022062517/56812d82550346895d9292ae/html5/thumbnails/15.jpg)
0 10 20 30 40 50 60 70 80 90 1000
10
20
30
40
50
60
70
80
90
100
children (cumulative %)
sin
gle
wo
rd u
tter
ance
s (%
)Single Word Utterances
10 20 30 40 50 60 70 80 90 100
![Page 16: Characterizing and Processing Robot-Directed Speech](https://reader035.vdocuments.mx/reader035/viewer/2022062517/56812d82550346895d9292ae/html5/thumbnails/16.jpg)
0 10 20 30 40 50 60 70 80 90 1000
10
20
30
40
50
60
70
80
90
100
children (cumulative %)
utt
eran
ces
wit
h e
nu
nci
atio
n (
%)
Utterances with Enunciation10 20 30 40 50 60 70 80 90 100
![Page 17: Characterizing and Processing Robot-Directed Speech](https://reader035.vdocuments.mx/reader035/viewer/2022062517/56812d82550346895d9292ae/html5/thumbnails/17.jpg)
Cooperative Speech Exists
Some clean word-learning scenarios– “Tiffany”
Can we use them as leverage?– “My name is Tiffany”
What recognition rates are possible?
![Page 18: Characterizing and Processing Robot-Directed Speech](https://reader035.vdocuments.mx/reader035/viewer/2022062517/56812d82550346895d9292ae/html5/thumbnails/18.jpg)
Study: Processing Speech
“LCSINFO” domain (Glass, Weinstein)
Provide some initial vocabulary
Build language model without transcripts
![Page 19: Characterizing and Processing Robot-Directed Speech](https://reader035.vdocuments.mx/reader035/viewer/2022062517/56812d82550346895d9292ae/html5/thumbnails/19.jpg)
Language Model
wordn
word2
word1
![Page 20: Characterizing and Processing Robot-Directed Speech](https://reader035.vdocuments.mx/reader035/viewer/2022062517/56812d82550346895d9292ae/html5/thumbnails/20.jpg)
OOV
Out Of Vocabulary Model
wordn
word2
word1
![Page 21: Characterizing and Processing Robot-Directed Speech](https://reader035.vdocuments.mx/reader035/viewer/2022062517/56812d82550346895d9292ae/html5/thumbnails/21.jpg)
Out Of Vocabulary Model
OOVphonen
phone2
phone1
(Bazzi)
![Page 22: Characterizing and Processing Robot-Directed Speech](https://reader035.vdocuments.mx/reader035/viewer/2022062517/56812d82550346895d9292ae/html5/thumbnails/22.jpg)
Hypothesized transcript
N-Best hypotheses
Run recognizer
Extract OOV fragments
Identify competition
Identify rarely-used additions
Remove from lexiconAdd to lexicon
Update lexicon,
baseforms
Update Language Model
![Page 23: Characterizing and Processing Robot-Directed Speech](https://reader035.vdocuments.mx/reader035/viewer/2022062517/56812d82550346895d9292ae/html5/thumbnails/23.jpg)
0 20 40 60 80 1000
10
20
30
40
50
60
70
80
90
100
coverage (%)
keyw
ord
erro
r rat
e (%
)Baseline performance Performance after clustering
K
eyw
ord
Err
or
Rat
e (%
)
coverage (%)
![Page 24: Characterizing and Processing Robot-Directed Speech](https://reader035.vdocuments.mx/reader035/viewer/2022062517/56812d82550346895d9292ae/html5/thumbnails/24.jpg)
Qualitative Results
Given:1600 Utterances
“email, phone, room, office, address”
Finds:1. n ah m b er 6. p l iy z2. w eh r ih z 7. ae ng k y uw3. w ah t ih z 8. n ow4. t eh l m iy 9. hh aw ax b
aw5. k ix n y uw 10. g r uw p
![Page 25: Characterizing and Processing Robot-Directed Speech](https://reader035.vdocuments.mx/reader035/viewer/2022062517/56812d82550346895d9292ae/html5/thumbnails/25.jpg)
Conclusions
What about acoustic models?
Idiosyncratic vocabulary is important
Pick a good name for your robot!
![Page 26: Characterizing and Processing Robot-Directed Speech](https://reader035.vdocuments.mx/reader035/viewer/2022062517/56812d82550346895d9292ae/html5/thumbnails/26.jpg)
Acknowledgements
MIT Initiative on Technology and Self
MIT Spoken Language Systems group
DARPA
NTT