january 26, 20001 stochastic transductions for machine translation* giuseppe riccardi at&t...
TRANSCRIPT
January 26, 2000 1
Stochastic Transductions for
Machine Translation*
Giuseppe RiccardiAT&T Labs-Research
*Joint work with Srinivas Bangalore and Enrico Bocchieri
2
Overview
• Motivation• Stochastic Finite State Machines• Learning Machine Translation Models• Case study
– MT for Human-Machine Spoken Dialog
• Experiments and Results
3
Speech Understanding Case
• ATIS 1994 DARPA Evaluation (G. Riccardi et al., ICASSP 1995, E. Bocchieri et al. SLT Workshop 1995 Levin et al., SLT Workshop 95)
Speech Recognizer
P( C ) Semantics C
P(W|C) Syntax L
P(A|W) Acoustic A
max P(A,W,C) min A L C
4
Motivation• Finite State Transducers (FST)
– Unified formalism to represent symbolic transductions
• Learnability– Automatically train transductions from (parallel)
corpora
• Speech-to-Speech Machine Translation chain– Combining speech and language sciences
TBXLC Source SpokenLanguage
Target SpokenLanguage
5
Overview
• Motivation• Stochastic Finite State Machines• Learning Machine Translation Models• Case study
– MT for Human-Machine Spoken Dialog
• Experiments and Results
6
Finite State Transducers• Weighted Finite State Transducers (FST)
– Algebraic Operations (1 + 2, 1 2…)• Composable E-S = E-J J-S
• Minimization min(1)
– Stochastic transductions (E-S :E* X S* [0,1])
• Joint Probability Decomposition: P(X1, X2, …, XN) P(X1) P(X2| X1)…P(XN| XN-1)– ..and computation
1 2 …. N
7
Stochastic TransducersI,1
1 2
I,1 2
I,3
3 4
I,
5V/4
1 2
cool,<adv>/0.3
cool,<adj>/0.3
cool,<noun>/0/2
cool,<verb>/0.2
8
Learning FSTs• Data-driven Learnig FSTs from large corpora.• Learnability
– Finite History Context (N-gram).– Generalization
• Unseen Event modeling (back-off)• Class-based n-grams.• Phrase grammar (long-distance dependency)• Context-free grammar approximation.
• Large-scale transductions– Efficient state and transition function model– Variable Ngram Stochastic Machine (VNSM)
9
VNSM: the state and transition space• Bottom-up approach
– Each state is associated to n-tuple in the corpus
– Each transition is associated to adjacent strings – Parametrization:
#states #n-tuples in the corpus#transitions #n-tuples in the corpus#-transition #(n-1)-tuples in the corpusVNSM recognizes W V* (V is the dictionary)
10
VNSM:Unseen Event Modeling
(the power of amnesia/reminiscence)
History= w2, w3
History=w3
History=“”
w4
11
VNSM: probability distributions
• Probability Distribution over W V*
• Parameter tying• Probability training
),1(context length variable"" theis
...
)|(
1
,..11,
nkk
wwW
wwwP
ijij
N
Niikiiji
jij
12
Overview
• Motivation• Stochastic Finite State Machines• Learning Machine Translation Models• Case study
– MT for Human-Machine Spoken Dialog
• Experiments and Results
14
Stochastic FST Machine Translation• Decompose language translation into two independent
processes.Lexical Choice : searching the target language wordsWord Reordering: searching the correct word order
• Modeling the two processes as stochastic finite state transductions
• Learning the transductions from bilingual corpora.
Speech and Language finite state transduction chainSpeech and Language finite state transduction chain
TBXLC Source SpokenLanguage
Target SpokenLanguage
15
Stochastic Machine Translation
• Noisy-channel paradigm (IBM)
• Stochastic Finite State Transducer Model
)()|(maxargˆTTS
WT WPWWPW
T
Problem Reordering Word )|(maxargˆ
Problem Choice Lexical ),(maxargˆ
)ˆ(TT
WRWT
TSW
T
WPW
WWPW
TT
T
MT SW TW
17
Learning Stochastic Transducers
• Given the input-output pair training set
• Align the input and output language sequences:
• Estimate the joint probability via VNSMs• Local reordering• Sentence-level reordering
),..,,)(,..,(, 2121 MNTS yyyxxxWW
},y...y{y ),)...(,)(,( )W,F(W M1i2211TS NN yxyxyx
18
Pairing and Aligning (1)
• Source-target language pairs• Sentence Alignment
– Automatic algorithm (Alshawi, Bangalore and Douglas, 1998)
Spanish : ajá quiero usar mi tarjeta de crédito
English : yeah I wanna use my credit card
Alignment : 1 3 4 5 7 0 6
19
Learning SFST from Bi-language
• Bi-language: each token consists of a source language word with its target language word.
• Ordering of tokens: source language order or target language order
• ajá quiero usar mi tarjeta de crédito• yeah I wanna use my credit card
• (ajá,yeah) (I) (quiero,wanna) (usar,use) (mi,my) (tarjeta,card) (de,
(crédito,credit)
SW
TW
)W,F(W TS
20
Learning Bilingual Phrases
• Effective translation of text chunks (e.g. collocations)
• Learn bilingual phrases – Joint entropy minimization on bi-language corpus– Weighted Mutual Information to rank bilingual
phrases
• Phrase-based VNST
• Local Reordering of phrases
).....,(),( )),)...(,(( 435432132111 yyxxyyyxxxyxyxh NN
una llamada de larga distancia
a call long distance
a long distance call
VNST
LocalReordering
21
Local Reordering
• Spanish Reordered Phrase=min(S TLM)– Word permutation machine expensive S is the “sausage” machine
TLM is the target language model
mmyyy )...( 21
22
Lexical Reordering• Output of the lexical choice transducer:
sequence of target language phrases.– like to make I'd call a calling card please
• Words in phrases are in target language word order.
• However, phrases need to be reordered in target language word order.
• Reordered: – I'd like to make a calling card call please
23
Lexical Reordering Models• Tree-based model
– Impose a tree structure on a sentence (Alshawi et.al ACL98)
– English: I'd like to charge this to my home phone
24
Lexical Reordering Models• Reordering using tree-local reordering
rules.Eng-Jap: 私は したいのです チャージ これを 私の 家の 電話に私は (I)
これを (this)
したいのです (like)
チャージ(charge)
家の (home)
私の(my)
電話に(phone)
私は (I)
これを (this)
したいのです (like)
チャージ(charge)
家の (home)
私の(my)
電話に(phone)
Japanese: 私は これを 私の 家の 電話に チャージ したいのです
25
Lexical Reordering Models (contd.)
• Dependency tree represented as a bracketed string (bounded) with reordering instructions. :[ したいのです : したいのです :-1 :[ チャージ : チャージ :]
:]
• Training VNSTs from bracketed corpus• Output of lexical reordering VNST: strings
with reordering instructions.• Instructions are composed with
“interpreter” FST to form target language sentence.
26
Tree Reordering• Sentence-level reordering
– Mapping sentence tree structures
English : my card credit(spanish order)
English : my credit card(english order)
card
my credit
-1 +1
card
my credit
-2 -1
Transduction 1
Transduction 2(alignment statistics)
Transduction 3
28
ASR-based Speech Translation
Alignment
VNST Learning
Bi-Phrase Learning
Tree Reordering
)W,F(W TS
)W,(W TS
a
b
Acoustic Model Training
)Speech,(WS
Lexicon FSM
)(Lexicon
A
L
c Speech Recognizerc b a L A
29
Overview
• Motivation• Stochastic Finite State Machines• Learning Machine Translation Models• Case study
– MT for Human-Machine Spoken Dialog
• Experiments and Results
30
MT Evaluation
• Lexical Accuracy (LA)– Bag of words.
• Translation Accuracy (TA)– Based on string alignment
• Application-driven evaluation– “How May I Help You?”– Spoken dialog for call routing– Classification based on salient phrase
detection
31
Automated Services and Customer Care
via Natural Spoken Dialog
Prompt is “AT&T. How may I help you?” User responds with unconstrained fluent
speech Spoken Dialog System for call routing
HELP
area
code
billing
credit
DA rate . . .
32
Examples
• Yes I like to make this long distance call area code x x x x x x x x x x
• Yeah I need the area code for rockmart georgia
• Yeah I’m wondering if you could place this call for me I can’t seem to dial it it don’t seem to want to go through for me
33
Call-Classification Performance
• False Rejection Rate: Probability of rejecting a call, given that the call-type is one of the 14 call-type set.
• Probability Correct: Probability of correctly classifying a call , given that the call is not rejected.
36
Conclusion• Stochastic Finite State based approach is
viable and effective for limited domain MT.
• Finite-state model chain for complex speech and language constraints.
• Multilingual speech application enabled by MT
• Coupling of ASR and MT
http://www.research.att.com/~srini/Projects/Anuvaad/home.html
37
Biblio-J. Berstel “Transductions and Context Free Languages” Teubner Studienbüchner-G. Riccardi, R. Pieraccini and E. Bocchieri, "Stochastic Automata for Language Modeling", Computer Speech and Language, 10, pp. 265-293, 1996.-Fernando C. N. Pereira and Michael Riley. Speech Recognition by Composition of Weighted Finite Automata . Finite-State Language Processing. MIT Press, Cambridge, Massachusetts. 1997-S. Bangalore and G. Riccardi, "Stochastic Finite-State Models for Spoken Language Machine Translation", Workshop on Embedded Machine Translation Systems, NAACL, pp. 52-59, Seattle, May 2000.
More references on http://www.research.att.com/info/dsp3
http://research.att.com/info/dsp3
39
Stochastic Finite State Models:from concepts to speech
• Variable Ngram Stochastic Automata (VNSA)– Concept Modeling for NLU – Word Sequence Modeling for ASR
• Phonotactic Transducers (context-to-phone)• Tree-structured Transducers (phone-to-word)• Stochastic-FSM based ASR (context-to-concept)
• ATIS Evaluation: it actually worked!
1993
1994
40
Why it worked?
• Symbolic representation (SFSM) for probabilistic sequence modeling (words, concepts,..).
• Learning algorithms– Cascade (phrase grammar -> {phrases, word classes} ->
words)
• Machine Combination– Context-to-Phone, Phone-to-Word, Context-to-Grammar (CLG)
• Decoding very simple and fast (Viterbi and Beam-Width Search)
48
Multilingual Speech Processing
ASR-MT EngineTarjeta de credito Credit card
Credit card
• Finite state chain allow for:Speech and Language coupling (e.g. prosody,
recognition errors) Integrated multilingual processing
49
Speech Translation• Previous approaches to Speech
Translation– Source language ASR– Translation Model
• Finite-state Model based Speech Translation– Source Language Acoustic Model– Lexical Choice Model– Lexical Reordering Model
51
Learning the state space and state transition function
(revised)• For each suffix in the corpus, we
create two states (one for string recognition and the other for backoff, epsilon transition).
• The size of the automaton is still linear in the corpus size
• The stochastic automaton is able to compute word probability for all strings in X*!.
52
Word Prediction……the President of United ???……
“elections”
History(4)=“the president of United”PrevClass=AdjPrevPrevClass=Function WordTrigger(10)=“Elections”
States/p1
Airlines/p2
/p3
the president of United
State Transition Probability Stochastic Finite State Automata/Transducers
53
Learning Lexical Choice Models
• English utterances recorded from customer calls.
• Manually translated into Japanese/Spanish.• ``Bunsetsu'' like tokenization for Japanese.• Alignment
English: I'd like to charge this to my home phoneJapanese: 私は これを 私の 家の 電話に チャージ したいのです Alignment: 1 7 0 6 2 0 3 4 5
• BilanguageI'd_ 私は like_ したいのです to_charge_ チャージ this_
これを to_ my_ 私の home_ 家の phone_ 電話に
54
Learning Bilingual Stochastic Transducers
• Learn stochastic transducers from bilanguage (Embedded MT 2000)
• Learn automatically bilingual phrases– Reordering within phrases.エイ ティー アンド ティー A T and T私の 家の 電話に to my home phone
私は コレクト コールを I need to makeかける 必要があります a collect call
tarjeta de credito credit carduna llamada de larga distancia a long distance
call
55
Lexical Choice Transducer• Language Model: N-gram model built on
phrase-chunked bilanguage.• A combination of phrases and words
maximize predictive power and minimize number of parameters
• Resulting finite-state automaton on bilanguage vocabulary is converted into a finite-state transducer.
56
Lexical Reordering• Output of the lexical choice transducer:
sequence of target language phrases.– like to make I'd call a calling card please
• Words in phrases are in target language word order.
• However, phrases need to be reordered in target language word order.
• Reordered: – I'd like to make a calling card call please
57
Lexical Reordering Models• Tree-based model
– Impose a tree structure on a sentence (Alshawi et.al ACL98)
– English: I'd like to charge this to my home phone
58
Lexical Reordering Models• Reordering using tree-local
reordering rules.Eng-Jap: 私は したいのです チャージ これを 私の 家の 電話
に私は (I)
これを (this)
したいのです (like)
チャージ(charge)
家の (home)
私の(my)
電話に(phone)
私は (I)
これを (this)
したいのです (like)
チャージ(charge)
家の (home)
私の(my)
電話に(phone)
Japanese: 私は これを 私の 家の 電話に チャージ したいのです
59
Lexical Reordering Models (contd.)
• Dependency tree represented as a bracketed string with reordering instructions. :[ したいのです : したいのです :-1 :[ チャージ : チャージ :] :]
• Lexical reordering FST: Result of training a stochastic finite-state transducer on the corpus of bracketed strings.
• Output of lexical reordering FST: strings with reordering instructions.
• Instructions are interpreted to form target language sentence.
60
Translation using stochastic FSTs• Sequence of finite-state transductions
English: I’d like to charge this to my home phoneEng-Jap: 私は したいのです チャージ これを 私の 家の 電話に
Japanese: 私は これを 私の 家の 電話に チャージ したいのです
私は (I)
これを (this)
したいのです (like)
チャージ(charge)
家の (home)
私の(my)
電話に(phone)
私は (I)
これを (this)
したいのです (like)
チャージ(charge)
家の (home)
私の(my)
電話に(phone)
61
Spoken Language Corpora
• Prompt: How may I help you?• Examples
Yeah I need the area code for rockmart georgiaYes I'd like to make this long distance call area
code x x x x x x x x x x Yeah I'm wondering if you could place this call
for me I can't seem to dial it it don't seem to want to go through for me
• Parallel corpora: English, Japanese, Spanish.
62
Evaluation Metric
• Evaluation metric for MT is a complex issue.• String edit distance between reference string
and result string (length in words: R)– Insertions (I)– Deletions (D)– Moves = pairs of Deletions and Insertions (M)– Remaining Insertions (I') and Deletions (D')
• Simple String Accuracy = 1 – (I + D + S) / R• Generation String Accuracy =
1 – (M + I' + D' + S) / R
63
Experiments and Evaluation
• Data Collection: – The customer side of operator-customer
conversations transcribed– Transcriptions were then manually translated
into Japanese
• Training Set: 12226 English-Japanese sentence pairs
• Test Set: 3253 English sentences.• Different translation models
– Local Reordering (LR), Sentence Reordering (SR)– Word bigram and Phrase bigram
64
Results• English-Japanese Translation
• Phrase-based models outperform word-based models
• Accuracy improves with local reordering (LR) (from 58.0% to 60.4% for phrase n-gram)
• Inclusion of sentence-level reordering (SR) improves accuracy dramatically (from 58.0% to 70.9% for phrase n-gram)
No LR
LR SR
Word n-gram
57.0 57.0
67.5
Phrase n-gram
58.0 60.4
70.9
65
Summary
• FSTs provide a framework for integration of modules.
• FST representation for lexical choice (Embedded MT-Workshop, 2000)
• Here, lexical reordering as an FST.• Integrated finite-state model for
speech translation.
67
Machine Translation
“When I look at an article in Russian, I say: 'This is really written in English, but it has been coded in some strange symbols.
I will now proceed to decode’”(Warren Weaver, March, 1947)
Is the process a finite state decoding chain?
69
A T and T area code to my home phone charge it to my home phone
A T y Tcodigo de areaal telefono de mi casa
cargarla al telefono de mi casa
Bilingual Phrases(reordered)
70
Learning SFST from Bi-language
• VNSM learning algorithm applied to the bi-language training set
• Joint probability
– Unseen translation pairs (bi-language token backoff)– Out-of-vocabulary source language word
• String-to-string mapping– Ambiguous machine (“multiple translations”)
)),)...(,(|,()W,P(W 1111TS iininiii yxyxyxP
71
Lexical Choice Transducer• Language Model: N-gram model built on
phrase-chunked bilanguage.• A combination of phrases and words
maximize predictive power and minimize number of parameters
• Resulting finite-state automaton on bilanguage vocabulary is converted into a finite-state transducer.
72
Finite State versus Non-Finite State MT
• Head-Transducer MT (Alshawi, Bangalore and Douglas, 1998)
Spanish-English Text
LA TA
Finite-State 77.4 60.4
Head-Transducer
77.3 62.3
English-Japanese
Speech
LA TA
Finite State 77.6 58.1
Head-Transducer
78.7 58.7
73
Lexical Choice Accuracy(Spanish-to-English)
VNST order Recall R
Precision P
F-Measure(2*P*R/(P+R))
Unigram 30.7 86.4 45.4Bigram 69.0 84.4 75.8Trigram 66.8 79.0 72.1Phrase Unigram
45.6 86.6 59.7
Phrase Bigram 72.2 83.4 77.4Phrase Trigram
71.9 83.1 77.1
74
Understanding Spontaneous Speech
1. Transcr: yes I wanted to charge a call to my business phone
ASR+parse: yes I_WANNA_CHARGE call to distance phone
Transd+decision: {THIRD NUMBER 0.75}
2. Transcr: hi I like to stick it on my calling card please
ASR+parse: hi I'D_LIKE_TO_SPEAK ON_MY_CALLING_CARD please
Transd+decision: {PERSON_PERSON 0.78} {CALLING CARD 0.96}
3. Transcr: I'm trying to find out a particular staple which fits one of your guns or a your desk staplers
ASR+parse: I'm trying TO_FIND_OUT they've CHECKED this is still call which six one of your thompson area state for ask stay with the
Transd+decision: {RATE 0.25} {OTHER 0.86}