detecting misunderstandings in the cmu communicator spoken dialog system
DESCRIPTION
Detecting Misunderstandings in the CMU Communicator Spoken Dialog System. Presented by: Dan Bohus Joint work with: Paul Carpenter, Chun Jin, Daniel Wilson, Rong Zhang, Alex Rudnicky Carnegie Mellon University – 2002. What’s a Spoken Dialog System ?. Human talking to a computer - PowerPoint PPT PresentationTRANSCRIPT
Detecting Misunderstandings in the CMU Communicator Spoken Dialog System
Presented by: Dan Bohus
Joint work with: Paul Carpenter, Chun Jin, Daniel Wilson, Rong Zhang, Alex Rudnicky
Carnegie Mellon University – 2002
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 2
What’s a Spoken Dialog System ?
Human talking to a computer Taking turns in a goal-oriented dialog
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 3
Why Spoken Language Interfaces ?
Speech: advantages and problems
Speech is the natural communication modality for humans
Can easily express fairly complex structures
Works well in hands- or eyes-busy situations
Serial channel
It is still an unreliable channel
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 4
Sample Spoken Dialog Systems
Interactive Voice Response systems (IVR) Information Access Systems
Air-travel planning (Communicator) Weather info over the phone (Jupiter) E-mail access over the phone (ELVIS) UA Baggage claims (Simon)
Other Systems: guidance, personal assistants, taskable agents, etc.
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 5
A Look Under the Hood …
S: Where are you flying from ?U: from London to Paris and then on to TorontoD: from London to Paris on then on to go on toSI: depart_location = London, arrive_location = ParisSO: query depart_timeNL: And when do you want to leave ?
S: And when do you want to leave ?
SpeechRecognition
DialogManagement
LanguageGeneration
Synthesis
U D SI
SONLS
Semanticizer(Parsing)
Backen
d
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 6
Roadmap
Intro to Spoken Dialog Systems
The Problem: Misunderstandings A Learning Solution Experiments and Results Conclusion
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 7
Speech Recognition
Speech Recognition is the main drive behind the development of SDS.
But it is problematic: Input signal quality Accents, Non-native speakers Spoken language disfluencies: stutters, false-
starts, /mm/, /um/
Typical Word Error Rates: 20-30%
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 8
Recognition Errors Impact
Errors propagate to upper levels : compromise the parse non-understandings propagate to Dialog Control level
misunderstandings, which can have a higher cost, if acted upon.
SpeechRecognition
DialogManagement
LanguageGeneration
Synthesis
U D SI
SONLS
Semanticizer(Parsing)
Backen
d
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 9
Misunderstandings in Action !S: Are you a registered user ?U: No I'm not. No [NO I'M NOT NO]S: What is your full name ?U: Adam Schumacher [ATHENS IN AKRON]S: There is an Athens in Georgia and in Greece. Which destination did you want ?U: Neither [NEITHER]S: I'm sorry, I didn't catch that. There is an Athens in Georgia and in Greece. Which
destination did you want ?U: Georgia [GEORGIA]S: A flight from Athens... Where do you want to go ?U: Start over [START OVER]S: Do you really want to start over ?U: Yes [YES]S: What city are you leaving from ?U: Hamilton, Ontario [HILTON ONTARIO]S: Sorry, I'm not sure I understood what you said. Where are you leaving from ?U: Hamilton [HILTON]S: I'm still having trouble understanding you. To go on, I need you to answer the following
question. What city are you leaving from ?U: Toronto [TORONTO]
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 10
Addressing the Problem
Wait for SR technology to reach better
performance.
Increase the robustness of systems when faced with poor recognition: Detect Misunderstandings Use Recovery Techniques
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 11
Problem Formulation
Given an input utterance, and the current state of the system, detect whether it was correctly perceived by the system or not.(confidence annotation problem)
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 12
Roadmap
Intro to Spoken Dialog Systems The Problem: Detecting Misunderstandings
A Learning Solution Experiments and Results Conclusion
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 13
A Classification Task
Cast the problem as a classification task
Heuristic approach “Garble” rule previously used in Communicator
Data-driven (learning) approach
Utterance GOOD / BADClassifier
(Features)
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 14
A Data-Driven Approach
Machine learning approach Learn to classify from a labeled training corpus
Use it to classify new instances
FeaturesClassifier
(Learn Mode)GOOD/BAD
FeaturesClassifier
GOOD/BAD
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 15
Ingredients
Three ingredients needed for a machine learning approach:
Corpus of labeled data to use for training Identify a set of relevant features Choose a classification technique
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 16
Roadmap
Intro to Spoken Dialog Systems The Problem: Misunderstandings A Learning Solution
Training corpus Features Classification techniques
Experiments and Results Conclusion
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 17
Corpus – Sources
Collected 2 months of sessions October and November 1999 About 300 sessions Both developer and outsider calls
Eliminated conversations with < 5 turns Developers calling to check if system is on-line Wrong number calls
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 18
Corpus – Structure
The Logs Generated automatically by various system modules Serve as a source of features for classification (also
contain the decoded utterances)
The Transcripts (the actual utterances) Performed and double-checked by a human
annotator Provide a basis for labeling
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 19
Corpus – Labeling
Labeling was done at the concept level.
Four possible labels: OK: The concept is okay RBAD: Recognition is bad PBAD:Parse is bad OOD: Out of domain
Aggregate utterance labels generated automatically.
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 20
Corpus – Sample Labeling
Only 6% of the utterances actually contained mixed-type concept labels !
Transcript: #noise# from London to Paris and then on to Toronto #noise#
Decodedutterance:
from London to Paris on then on to go on to
Parse: depart_loc:[from London] arrive_loc:[to Paris] interj:[then] resume:[go on]
Labeling: depart_loc:OK arrive_loc:OK interj:OK resume:RBAD
AggregateLabel:
BAD
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 21
Corpus – Summary
Started with 2 months of dialog sessions Eliminated short, ill-formed sessions Transcribed the corpus Labeled it at the concept level Discarded mixed-label utterances
4550 binary labeled utterances 311 dialogs
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 22
Features – Sources
Traditionally, features are extracted from the Speech Recognition layer [Chase].
In a SDS, there are at least 2 other orthogonal knowledge sources: The Parser The Dialog Manager
Speech
Parsing
Dialog
Features
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 23
Features – Speech Recog.
WordNumber (11)
UnconfidentPerc = % of unconfident words (9%) this feature already captures other decoder level
features
Transcript: #noise# from London to Paris and then on to Toronto #noise#
Decoded: from London to Paris on then on to ?go? on to
Parse: depart_loc:[from London] arrive_loc:[to Paris] interj:[then] resume:[?go? on]
Speech
Parsing
Dialog
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 24
Features – Parser Level
UncoveredPerc = % of words uncovered by the parse (36%)
GapNumber = # of unparsed fragments (3) FragmentationScore = # of transitions between
parsed and unparsed fragments (5) Garble = flag computed by a heuristic rule based
on parse coverage and fragmentation
Speech
Parsing
Dialog
Transcript: #noise# from London to Paris and then on to Toronto #noise#
Decoded: from London to Paris on then on to ?go? on to
Parse: depart_loc:[from London] arrive_loc:[to Paris] interj:[then] resume:[?go? on]
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 25
Features – Parser Level (2)
ConceptBigram = bigram concept model score: P(c1… cn) P(cn| cn-1) P(cn-1| cn-2)… P(c2| c1)P(c1) Probabilities trained from a corpus
ConceptNumber (4)
Speech
Parsing
Dialog
Transcript: #noise# from London to Paris and then on to Toronto #noise#
Decoded: from London to Paris on then on to ?go? on to
Parse: depart_loc:[from London] arrive_loc:[to Paris] interj:[then] resume:[?go? on]
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 26
Features – Dlg Mng. Level
DialogState = the current state of the DM
StateDuration = for how many turns did the DM remain in the same state
TurnNumber = how many turns since the beginning of the session
ExpectedConcepts = indicates if the concepts correspond to the expectation of the DM.
Speech
Parsing
Dialog
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 27
Features – Summary
12 Features from 3 levels in the system: Speech Level Features:
WordNumber, UnconfidentPerc
Parsing Level Features: UncoveredPerc, FragmentationScore,
GapNumber, Garble, ConceptBigram, ConceptNumber
Dialog Management Level Features: DialogState, StateDuration, TurnNumber,
ExpectedConcepts
Speech
Parsing
Dialog
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 28
Classification Techniques
Bayesian Networks Boosting Decision Tree Artificial Neural Networks Support Vector Machine Naïve Bayes
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 29
Roadmap
Intro to Spoken Dialog Systems The Problem: Detecting Misunderstandings A Learning Approach
Training corpus Features Classification techniques
Experiments and Results Conclusion
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 30
Experimental Setup
Performance metric: classification error rate 2 Performance baselines:
“Random” baseline = 32.84% “Heuristic” baseline = 25.69%
Used a 10-fold cross-validation process Build confidence intervals for the error rates Do statistical analysis of the differences in
performance exhibited by the classifiers
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 31
Results – Individual Features
Rank Feature Level Mean Err. Graphic
1. UncoveredPerc Parsing 19.93%
2. ExpectedConcepts Dialog Manag. 20.97%
3. GapNumber Parsing 23.01%
4. ConceptBigram Parsing 23.14%
5. Garble Parsing/Recog. 25.32%
6. ConceptNumber Parsing 25.69%
7. UnconfidentPerc Recognition 27.34%
8. DialogState Dialog Manag. 31.03%
9. WordNumber Recognition 32.33%
10. FragmentationScore Parsing 32.73%
11. StateDuration Dialog Manag. 32.84%
12. TurnNumber Dialog Manag. 33.14%
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 32
Results – Classifiers
Classifier Mean Error Graphic
Random Baseline 32.84%
“Heuristic” Baseline 25.69%
AdaBoost 16.59%
Decision Tree 17.32%
Bayesian Network 17.82%
SVM 18.40%
Neural Network 18.90%
Naïve Bayes 21.65%
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 33
An in Depth Look at Error Rates
OK BAD
Classifier says OK TP FP
Classifier says BAD FN TN
FP = False acceptance FN = False rejection Error Rate = FP + FN CDR = TN/(TN+FP) = 1-(FP/NBAD)
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 34
Results – Classifiers (cont’d)
Classifier Mean Error F/P Rate F/N Rate
Random Baseline 32.84% 32.84% 0.00%
“Heuristic” Baseline 25.32% 25.30% 0.02%
AdaBoost 16.59% 11.43% 5.16%
Decision Tree 17.32% 11.82% 5.49%
Bayesian Network 17.82% 9.41% 8.42%
SVM 18.40% 15.01% 3.39%
Neural Network 18.90% 15.08% 3.82%
Naïve Bayes 21.65% 14.24% 7.41%
77.4 % Correct detection rate
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 35
Conclusion
Spoken Dialog System performance is strongly impaired by misunderstandings
Increase the robustness of systems when faced with poor recognition: Detect Misunderstandings Use Recovery Techniques
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 36
Conclusion (cont’d)
Data-driven classification task Corpus 12 Features from 3 levels in the system Empirically compared 6 classification techniques
Data-Driven Misunderstanding Detector Significant improvement over previous heuristic
classifier Correctly detect 74% of the misunderstandings
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 37
Future Work
Detect Misunderstandings Improve performance by adding new features Identify the source of the error
Use Recovery Techniques Incorporate the confidence score into the Dialog
Management process
03-08-2002 Detecting Misunderstandings in the CMU Communicator… Page 38
Pointers
“Is This Conversation On Track?”, P.Carpenter, C.Jin, D.Wilson, R.Zhang, D.Bohus, A.Rudnicky, Eurospeech 2001, Aalborg, Denmark
CMU Communicator 1-412-268-1084
www.cs.cmu.edu/~dbohus/SDS