wolfgang wahlster german research center for artificial intelligence, dfki gmbh stuhlsatzenhausweg 3...
TRANSCRIPT
Wolfgang Wahlster
German Research Center for Artificial Intelligence, DFKI GmbH
Stuhlsatzenhausweg 366123 Saarbruecken, Germany
phone: (+49 681) 302-5252/4162fax: (+49 681) 302-5341e-mail: [email protected]
WWW:http://www.dfki.de/~wahlster
Dagstuhl 2000
Pervasive Speech andLanguage Technology
Dagstuhl 2000© Wolfgang Wahlster, DFKI
Pervasive Speech and Language Technology
A capuccino in 10 minutes, please!
Send the following email to Mark Maybury: Hi Mark,
please forward the following agenda to your project
partners!
Let‘s go to Baker Street in Berkeley!
I would like to hear Mozart‘s piano concert No. 3!
Speech-controlled coffee machine
Speech-basedcar navigation
Speech-enabledmusic selection
Dictation
Dagstuhl 2000© Wolfgang Wahlster, DFKI
Show me all CNN news of the last 3 months that
feature Bill Clinton discussing health care!
I would like to make an appointment with
Dr. Kuremastu in Kyoto next week!
Pervasive Speech and Language Technology
What has Jim Hendler said about DAML during our
recent Dagstuhl seminar?
Information on demand
Audio Mining
Speech-to-SpeechTranslation
Dagstuhl 2000© Wolfgang Wahlster, DFKI
What has the speakersaid?100
Alternatives
What has the speaker meant?
10Alternatives
What does the speakerwant?
Unambiguous Understanding in the
Dialog Context
Red
uct
ion
of
Un
cert
ain
tySprachanalyse
Speech Recognition
Speech Input
Discourse Context
Knowledgeabout Domainof Discourse
Grammar
LexicalMeaning
AcousticLanguage Models
Word Lists
Speech Analysis
SpeechUnder-
standing
Three Levels of Language Processing
Dagstuhl 2000© Wolfgang Wahlster, DFKI
Input Conditions Naturalness Adaptability Dialog Capabilities
Incr
easi
ng
Co
mp
lexi
ty
Close-SpeakingMicrophone/Headset
Push-to-talk
Telephone,Pause-basedSegmentation
Isolated Words
Read ContinuousSpeech
SpeakerIndependent
SpeakerDependent
MonologDictation
Information-seeking Dialog
Open Microphone,GSM Quality
SpontaneousSpeech
Speakeradaptive
MultipartyNegotiation
Verbmobil
Challenges for Language Engineering
Dagstuhl 2000© Wolfgang Wahlster, DFKI
Wann fährt der nächsteZug nach Hamburg ab?
When does the next train to Hamburg depart?
Wo befindet sichdas nächste
Hotel?
Where is the nearest hotel?
Context-Sensitive Speech-to-Speech Translation
VerbmobilServer
Dagstuhl 2000© Wolfgang Wahlster, DFKI
Mobile Speech-to-Speech Translation of Spontaneous Dialogs
As the name Verbmobil suggests,the system supports verbal
communication with foreign dialog partners in mobile situations.
1
2
face-to-face conversations
telecommunication
Dagstuhl 2000© Wolfgang Wahlster, DFKI
Mobile Speech-to-Speech Translation of Spontaneous Dialogs
Verbmobil Speech Translation Server
Solution: Conference Call: The Verbmobil Speech Translation Server
is accessed by GSM mobile phones.
Dagstuhl 2000© Wolfgang Wahlster, DFKI
General Speech Recognition Task
GermanGerman
EnglishEnglish
JapaneseJapanese
Audio Signal Recognizers Word Hypotheses Graph
Dagstuhl 2000© Wolfgang Wahlster, DFKI
Word Hypotheses Graphs (WHGs)
WHGs realize the interface between acoustic and linguistic processing
Edge = Word
Best Hypothesis
Acoustic Score
Dagstuhl 2000© Wolfgang Wahlster, DFKI
Massive Data Collection Efforts
Transliteration Variant 1Transliteration Variant 2 Lexical OrthographyCanonical PronounciationManual Phonological Segmentation
Automatic Phonological SegmentationWord SegmentationProsodic SegmentationDialog ActsNoises
Superimposed SpeechSyntactic CategoryWord CategorySyntactic FunctionProsodic Boundaries
The so-called Partitur (German word for musical score)orchestrates fifteen strata of annotations
3,200 dialogs (182 hours)with 1,658 speakers79,562 turnsdistributed on56 CDs, 21.5 GB
Dagstuhl 2000© Wolfgang Wahlster, DFKI
Machine Learningfor the Integration of Statistical Properties into
Symbolic Models for Speech Recognition, Parsing,Dialog Processing, Translation
TranscribedSpeech Data
SegmentedSpeech
with ProsodicLabels
AnnotatedDialogs withDialog Acts
Treebanks &Predicate-ArgumentStructures
AlignedBilingualCorpora
HiddenMarkovModels
Neural Nets,MultilayeredPerceptrons
ProbabilisticAutomata
ProbabilisticGrammars
ProbabilisticTransfer
Rules
Extracting Statistical Properties from Large Corpora
Dagstuhl 2000© Wolfgang Wahlster, DFKI
M1 M2 M3
M5 M6M4
BB 2BB 1 BB 3
M1
Multi-Agent Architecture Multi-Blackboard Architecture
Each module must know, which moduleproduces what data
Direct communication between modulesEach module has only one instance Heavy data traffic for moving copies
around Multiparty and telecooperation applications
are impossible Software: ICE and ICE Master Basic Platform: PVM
All modules can register for each blackboard dynamically
No direct communication between modules Each module can have several instances No copies of representation structures
(word lattice, VIT chart) Multiparty and Telecooperation applications are
possible Software: PCA and Module Manager Basic Platform: PVM
From Multi-Agent Architectures to a Multi-Blackboard Architectures
BlackboardsM2
M3
M6
M4 M5
Dagstuhl 2000© Wolfgang Wahlster, DFKI
Audio Data
Word HypothesesGraph with
Prosodic Labels
VITsUnderspecified
DiscourseRepresentations
CommandRecognizer
SpontaneousSpeech Recognizer
Channel/SpeakerAdaptation
ProsodicAnalysis
StatisticalParser
Dialog ActRecognition
Chunk Parser
HPSGParser
SemanticConstruction
Robust DialogSemantics
SemanticTransfer
Generation
A Multi-Blackboard Architecture for the Combinationof Results from Deep and Shallow Processing Modules
Dagstuhl 2000© Wolfgang Wahlster, DFKI
The Use of Prosodic Information at All Processing Stages
Speech Signal Word Hypotheses Graph
Multilingual Prosody ModuleProsodic features:durationpitchenergypause
Search SpaceRestriction
Parsing
Dialog ActSegmentation and
Recognition
Dialog Understanding
Constraints forTransfer
Translation
LexicalChoice
GenerationSpeech
Synthesis
SpeakerAdaptation
BoundaryInformationBoundary
InformationBoundary
InformationBoundary
InformationSentence
MoodSentence
MoodAccented
WordsAccented
WordsProsodic Feature
Vector
Dagstuhl 2000© Wolfgang Wahlster, DFKI
Competing Strategies for Robust Speech Translation
Concurrent processing modules combine deep semantic translationwith shallow surface-oriented translation methods.
Word LatticeWord Lattice
timeout?
timeout?
Acceptable Translation RateAcceptable Translation Rate
Selection ofbest result
Selection ofbest result
Expensive, but precise Translation Cheap, but approximate Translation
Principled and compositional syntactic and semantic analysis
Semantic-based transfer of Verbmobil Interface Terms (VITs) as set of underspecified DRS
Case-based Translation
Dialog-act based translation
Statistical translation
Results withConfidence Values
Results withConfidence Values
Dagstuhl 2000© Wolfgang Wahlster, DFKI
Robust Dialog SemanticsCombination and knowledge-
based reconstruction of complete VITs
Robust Dialog SemanticsCombination and knowledge-
based reconstruction of complete VITs
Complete and SpanningVITs
Complete and SpanningVITs
Integrating Shallow and Deep Analysis Components in a Multi-Blackboard Architecture
Chunk ParserChunk ParserStatistical ParserStatistical Parser HPSG ParserHPSG Parser
partial VITs Chart with a combination of
partial VITs
Chart with a combination of
partial VITs
partial VITs
partial VITs
AugmentedWord Hypotheses
Graph
AugmentedWord Hypotheses
Graph
Dagstuhl 2000© Wolfgang Wahlster, DFKI
Incremental chart construction and anytime processing Rule-based combination and transformation of partial UDRS coded as VITs Selection of a spanning analysis using a bigram model for VITs
(trained on a tree bank of 24 k VITs)
Chart Parser using cascaded finite-state transducers
Statistical LR parser trained on treebank
Very fast HPSG parser
SemanticConstruction
VHG: A Packed Chart Representation of Partial Semantic Representations
Dagstuhl 2000© Wolfgang Wahlster, DFKI
I need a car next Tuesday oops MondayI need a car next Tuesday oops Monday
Original Utterance Editing Phase Repair Phase
Reparandum Hesitation Reparans
Recognition ofSubstitutions
Transformation of theWord Hypothesis Graph
I need a car next MondayI need a car next Monday
Verbmobil Technology: Understands Speech Repairs and extracts the intended meaning
Dictation Systems like: ViaVoice, VoiceXpress, FreeSpeech, Naturally Speaking cannot deal with spontaneous speech and transcribe the corrupted utterances.
The Understanding of Spontaneous Speech Repairs
Dagstuhl 2000© Wolfgang Wahlster, DFKI
Wir treffen uns inMannheim, äh, in Saarbrücken.
(We are meeting in Mannheim, oops, in Saarbruecken.)
We are meetingin Saarbruecken.
English
German
Automatic Understanding and Correction of Speech Repairs in Spontaneous Telephone Dialogs
Dagstuhl 2000© Wolfgang Wahlster, DFKI
The preposition ‚in‘ is missing in all paths through the word hypotheses graph.A temporal NP is transformed into a temporal modifier using a underspecifiedtemporal relation:
[temporal_np(V1)] [typeraise_to_mod (V1, V2)] & V2
The modifier is applied to a proposition:
[type (V1, prop), type (V2, mod)] [apply (V2, V1, V3)] & V3
Let us meet the late afternoon to catch the train to Frankfurt
Let us meet (in) the late afternoon to catch the train to Frankfurt
Robust Dialog Semantics: Combining and Completing Partial Representations
Dagstuhl 2000© Wolfgang Wahlster, DFKI
Integrating Deep and Shallow Processing: Combining Results from Concurrent Translation Threads
Segment 1Translated by Semantic Transfer
Segment 1Translated by Semantic Transfer
Segment 2Translated by Case-Based Translation
Segment 2Translated by Case-Based Translation
Alternative Translations with Confidence Values
StatisticalTranslationStatistical
TranslationDialog-Act Based
TranslationDialog-Act Based
TranslationSemanticTransferSemanticTransfer
Case-BasedTranslation
Case-BasedTranslation
Segment 1If you prefer another hotel,
Segment 1If you prefer another hotel,
Segment 2please let me know.
Segment 2please let me know.
Selection ModuleSelection Module
Dagstuhl 2000© Wolfgang Wahlster, DFKI
I have time monday.onSentence to synthesize
I have time monday
I have time monday
I have monday
I
on
on
on
onTok
ens
S E
Edge direction
S E
have time
I mondayon
Unit Selection Algorithm
Dagstuhl 2000© Wolfgang Wahlster, DFKI
MicrophonePush-to-talk
Switch
Please call Doris Wahlster.
Open the left window in the back.
I want to hear the weather channel.
When will I reach the next gas station?
Where is the next parking lot?
Speech control of: cellular phone, radio, windows / AC, route guidance system Option for S-, C-, and E-Class of Mercedes and BMW Speaker-independent, Garbage models for non-speech (blinker, AC, wheels)
Linguatronic : Spoken Dialogs with Mercedes-Benz
Dagstuhl 2000© Wolfgang Wahlster, DFKI
Multilingualand Mobile
CommunicationAssistants
Multimodal Interfaces
SmartKom
Speech-based Web Access to Multilingual
Web pages
WAP Phones WebTV
Multilingual Audio Retrieval
and Audio Mining
Discussions Lecture Notes Organizers
MultilingualIndexing andAnnotation of
Videos
Video Archives News Archives
Call CentersECommerce Mobile Travel Assistance Telephone Translations
Verbmobil
Dialog Translation
International Research Trends in Multilingual Systems
Multilingual Language Technology Speech Recognition, Language Understanding, Language Generation,
and Speech Synthesis
Multilingual Language Technology Speech Recognition, Language Understanding, Language Generation,
and Speech Synthesis
Spontaneous Speech, Robust Processing and Translation, Semantic and Pragmatic Understanding
Dagstuhl 2000© Wolfgang Wahlster, DFKI
Real-world problems in language technology like the understanding of spoken dialogs, speech-to-speech translation and multimodal dialog systems can onlybe cracked by the combined muscle of deep and shallow processing approaches.
In a multi-blackboard architecture based on packed representations on all processing levels (speech recognition, parsing, semantic processing, translation, generation) using charts with underspecified representations (eg. UDRS) the results of concurrent processing threads can be combined in an incremental fashion.
Conclusion I
Dagstuhl 2000© Wolfgang Wahlster, DFKI
All results of concurrent processing modules should come with a confidence value, so that a selection module can choose the most promising result at a each processing stage.
Packed representations together with formalisms for underspecification capture the uncertainties in a each processing phase, so that the uncertainties can be reduced by linguistic, discourse and domain constraints as soon as they become applicable.
Conclusion II
Dagstuhl 2000© Wolfgang Wahlster, DFKI
Deep Processing can be used for merging, completing and repairing the results of shallow processing strategies.
Shallow methods can be used to guide the search in deepprocessing.
Statistical methods must be augmented by symbolic models (eg. Class-based language modelling, word order normalization as part of statistical translation).
Statistical methods can be used to learn operators orselection strategies for symbolic processes.
It is much more than a balancing act... (see Klavans and Resnik 1996)
Conclusion III
Dagstuhl 2000© Wolfgang Wahlster, DFKI
Open Problems for the Next Decade
Problems with current machine learning approaches
Expensive data collection
Cognitively unrealistic training data
Data sparseness
Problems with current hand-crafted knowledge sources
Brittleness
Domain dependence
Limited scalability
Dagstuhl 2000© Wolfgang Wahlster, DFKI
A Speculative Conclusion (+50 years)
-500 years TODAY +50 years
Oral Society Textual Society Oral Society
News and knowledge ispassed orally
No mass storageNo automatic processingNo automatic retrieval
Mass storage of textsText ProcessingText Retrieval
Mass storage of speechSpeech ProcessingAudio Retrieval
News and knowledge ispassed textually
News and knowledge ispassed orally