automatic rule learning for resource-limited machine translation alon lavie, katharina probst, erik...
Post on 21-Dec-2015
234 views
TRANSCRIPT
Automatic Rule Learning for Resource-Limited Machine
Translation
Alon Lavie, Katharina Probst, Erik Peterson, Jaime Carbonell, Lori Levin,
Ralf Brown
Language Technologies InstituteCarnegie Mellon University
October 11, 2002 AMTA 2002 2
Why Machine Translation for Minority and Indigenous Languages?
• Commercial MT economically feasible for only a handful of major languages with large resources (corpora, human developers)
• Is there hope for MT for languages with limited resources?
• Benefits include:– Better government access to indigenous communities
(Epidemics, crop failures, etc.)– Better indigenous communities participation in
information-rich activities (health care, education, government) without giving up their languages.
– Language preservation– Civilian and military applications (disaster relief)
October 11, 2002 AMTA 2002 3
MT for Minority and Indigenous Languages: Challenges
• Minimal amount of parallel text• Possibly competing standards for
orthography/spelling• Often relatively few trained linguists• Access to native informants possible• Need to minimize development time
and cost
October 11, 2002 AMTA 2002 4
AVENUE PartnersLanguage Country Institutions
Mapudungun (in place)
Chile Universidad de la Frontera, Institute for Indigenous Studies, Ministry of Education
Quechua(discussion)
Peru Ministry of Education
Iñupiaq(discussion)
US (Alaska) Ilisagvik College, Barrow school district, Alaska Rural Systemic Initiative, Trans-Arctic and Antarctic Institute, Alaska Native Language Center
Siona(discussion)
Colombia OAS-CICAD, Plante, Department of the Interior
October 11, 2002 AMTA 2002 5
AVENUE: Two Technical Approaches
• Generalized EBMT• Parallel text 50K-
2MB (uncontrolled corpus)
• Rapid implementation
• Proven for major L’s with reduced data
• Transfer-rule learning
• Elicitation (controlled) corpus to extract grammatical properties
• Seeded version-space learning
October 11, 2002 AMTA 2002 6
AVENUE Architecture
User
Learning Module
ElicitationProcess
SVSLearning Process
TransferRules
Run-Time Module SLInput
SL Parser
TransferEngine
TLGenerator
EBMTEngine
UnifierModule
TLOutput
October 11, 2002 AMTA 2002 7
Learning Transfer-Rules for Languages with Limited Resources
• Rationale:– Large bilingual corpora not available– Bilingual native informant(s) can translate and align a
small pre-designed elicitation corpus, using elicitation tool– Elicitation corpus designed to be typologically
comprehensive and compositional– Transfer-rule engine and new learning approach support
acquisition of generalized transfer-rules from the data
October 11, 2002 AMTA 2002 8
Overview of Learning Approach
1. Elicitation Corpus: Bilingual data is acquired from a specifically engineered corpus
2. Feature Detection: Gather information about features and their values in the minority language
3. Rule Learning: Infer syntactic transfer rules by first guessing and then iteratively refining
October 11, 2002 AMTA 2002 9
The Elicitation Corpus
• Translated, aligned by bilingual informant• Corpus consists of linguistically diverse
constructions• Based on elicitation and documentation work
of field linguists (e.g. Comrie 1977, Bouquiaux 1992)
• Organized compositionally: elicit simple structures first, then use them as building blocks
• Goal: minimize size, maximize coverage
October 11, 2002 AMTA 2002 10
The Transfer EngineAnalysis
Source text is parsed into its grammatical structure. Determines transfer application ordering.
Example:
他 看 书。 (he read book)
S
NP VP
N V NP
他 看 书
TransferA target language tree is created by reordering, insertion, and deletion.
S
NP VP
N V NP
he read DET N
a book
Article “a” is inserted into object NP. Source words translated with transfer lexicon.
GenerationTarget language constraints are checked and final translation produced.
E.g. “reads” is chosen over “read” to agree with “he”.
Final translation:
“He reads a book”
October 11, 2002 AMTA 2002 11
Transfer Rule Formalism
Type informationPart-of-speech/constituent
informationAlignments
x-side constraints
y-side constraints
xy-constraints, e.g. ((Y1 AGR) = (X1 AGR))
;SL: the man, TL: der Mann
NP::NP [DET N] -> [DET N]((X1::Y1)(X2::Y2)
((X1 AGR) = *3-SING)((X1 DEF = *DEF)((X2 AGR) = *3-SING)((X2 COUNT) = +)
((Y1 AGR) = *3-SING)((Y1 DEF) = *DEF)((Y2 AGR) = *3-SING)((Y2 GENDER) = (Y1 GENDER)))
October 11, 2002 AMTA 2002 12
Transfer Rule Formalism (II)
Value constraints
Agreement constraints
;SL: the man, TL: der MannNP::NP [DET N] -> [DET N]((X1::Y1)(X2::Y2)
((X1 AGR) = *3-SING)((X1 DEF = *DEF)((X2 AGR) = *3-SING)((X2 COUNT) = +)
((Y1 AGR) = *3-SING)((Y1 DEF) = *DEF)((Y2 AGR) = *3-SING)((Y2 GENDER) = (Y1 GENDER)))
October 11, 2002 AMTA 2002 13
Rule Learning - Overview
• Goal: Acquire Syntactic Transfer Rules• Use available knowledge from the source
side (grammatical structure)• Three steps:
1. Flat Seed Generation: first guesses at transfer rules; no syntactic structure
2. Compositionality: use previously learned rules to add structure
3. Seeded Version Space Learning: refine rules by generalizing with validation
October 11, 2002 AMTA 2002 14
Flat Seed Generation
Create a transfer rule that is specific to the sentence pair, but abstracted to the POS level. No syntactic structure.
Element Source
SL POS sequence f-structure
TL POS sequence TL dictionary, aligned SL words
Type information corpus, same on SL and TL
Alignments informant
x-side constraints f-structure
y-side constraints TL dictionary, aligned SL words (list of projecting features)
October 11, 2002 AMTA 2002 15
Flat Seed Generation - Example
The highly qualified applicant did not accept the offer.Der äußerst qualifizierte Bewerber nahm das Angebot nicht an.
((1,1),(2,2),(3,3),(4,4),(6,8),(7,5),(7,9),(8,6),(9,7))
S::S [det adv adj n aux neg v det n] -> [det adv adj n v det n neg vpart](;;alignments:(x1:y1)(x2::y2)(x3::y3)(x4::y4)(x6::y8)(x7::y5)(x7::y9)(x8::y6)(x9::y7));;constraints:((x1 def) = *+) ((x4 agr) = *3-sing) ((x5 tense) = *past) …. ((y1 def) = *+) ((y3 case) = *nom) ((y4 agr) = *3-sing) …. )
October 11, 2002 AMTA 2002 16
Compositionality - Overview
• Traverse the c-structure of the English sentence, add compositional structure for translatable chunks
• Adjust constituent sequences, alignments• Remove unnecessary constraints, i.e. those
that are contained in the lower-level rule• Adjust constraints: use f-structure of correct
translation vs. f-structure of incorrect translations to introduce context constraints
October 11, 2002 AMTA 2002 17
Compositionality - Example
S::S [det adv adj n aux neg v det n] -> [det adv adj n v det n neg vpart](;;alignments:(x1:y1)(x2::y2)(x3::y3)(x4::y4)(x6::y8)(x7::y5)(x7::y9)(x8::y6)(x9::y7));;constraints:((x1 def) = *+) ((x4 agr) = *3-sing) ((x5 tense) = *past) …. ((y1 def) = *+) ((y3 case) = *nom) ((y4 agr) = *3-sing) …. )
S::S [NP aux neg v det n] -> [NP v det n neg vpart](;;alignments:(x1::y1)(x3::y5)(x4::y2)(x4::y6)(x5::y3)(x6::y4);;constraints:((x2 tense) = *past) …. ((y1 def) = *+) ((y1 case) = *nom) …. )
NP::NP [det AJDP n]-> [det ADJP n]
((x1::y1)…((y3 agr) = *3-sing)((x3 agr = *3-sing)
….)
October 11, 2002 AMTA 2002 18
Seeded Version Space Learning: Overview
• Goal: further generalize the acquired rules• Methodology:
– Preserve general structural transfer– Consider relaxing specific feature constraints
• Seed rules are grouped into clusters of similar transfer structure (type, constituent sequences, alignments)
• Each cluster forms a version space: a partially ordered hypothesis space with a specific and a general boundary
• The seed rules in a group form the specific boundary of a version space
• The general boundary is the (implicit) transfer rule with the same type, constituent sequences, and alignments, but no feature constraints
October 11, 2002 AMTA 2002 19
Seeded Version Space Learning
NP v det n NP VP …1. Group seed rules into version spaces as above.2. Make use of partial order of rules in version space. Partial order is defined
via the f-structures satisfying the constraints.3. Generalize in the space by repeated merging of rules:
1. Deletion of constraint2. Moving value constraints to agreement constraints, e.g.
((x1 num) = *pl), ((x3 num) = *pl) ((x1 num) = (x3 num)
4. Check translation power of generalized rules against sentence pairs
October 11, 2002 AMTA 2002 20
Seeded Version Space Learning: Example
S::S [NP aux neg v det n] -> [NP v det n neg vpart](;;alignments:(x1::y1)(x3::y5)(x4::y2)(x4::y6)(x5::y3)(x6::y4);;constraints:((x2 tense) = *past) …. ((y1 def) = *+) ((y1 case) = *nom) ((y1 agr) = *3-sing) … )((y3 agr) = *3-sing) ((y4 agr) = *3-sing)… )
S::S [NP aux neg v det n] -> [NP v det n neg vpart](;;alignments:(x1::y1)(x3::y5)(x4::y2)(x4::y6)(x5::y3)(x6::y4);;constraints:((x2 tense) = *past) …((y1 def) = *+) ((y1 case) = *nom) ((y1 agr) = *3-plu) …((y3 agr) = *3-plu) ((y4 agr) = *3-plu)… )
S::S[NP aux neg v det n] -> [NP n det n neg vpart](;;alignments:(x1::y1)(x3::y5)(x4::y2)(x4::y6)(x5::y3)(x6::y4);;constraints:((x2 tense) = *past) …((y1 def) = *+) ((y1 case) = *nom)((y4 agr) = (y3 agr))… )
October 11, 2002 AMTA 2002 21
Preliminary Evaluation
• English to German• Corpus of 141 ADJPs, simple NPs and
sentences• 10-fold cross-validation experiment• Goals:
– Do we learn useful transfer rules?– Does Compositionality improve
generalization?– Does VS-learning improve generalization?
October 11, 2002 AMTA 2002 22
Summary of Results
• Average translation accuracy on cross-validation test set was 62%
• Without VS-learning: 43%• Without Compositionality: 57%• Average number of VSs: 24• Average number of sents per VS: 3.8• Average number of merges per VS: 1.6• Percent of compositional rules: 34%
October 11, 2002 AMTA 2002 23
Conclusions
• New paradigm for learning transfer rules from pre-designed elicitation corpus
• Geared toward languages with very limited resources
• Preliminary experiments validate approach: compositionality and VS-learning improve generalization
October 11, 2002 AMTA 2002 24
Future Work
1. Larger, more diverse elicitation corpus2. Additional languages (Mapudungun…)3. Less information on TL side4. Reverse translation direction5. Refine the various algorithms:
• Operators for VS generalization• Generalization VS search• Layers for compositionality
6. User interactive verification
October 11, 2002 AMTA 2002 25
Seeded Version Space Learning: Generalization
• The partial order of the version space:Definition: A transfer rule tr1 is strictly more general than another transfer rule tr2 if all f-structures that are satisfied by tr2 are also satisfied by tr1.
• Generalize rules by merging them:– Deletion of constraint– Raising two value constraints to an agreement
constraint, e.g. ((x1 num) = *pl), ((x3 num) = *pl) ((x1 num) = (x3 num))
October 11, 2002 AMTA 2002 26
Seeded Version Space Learning: Merging Two Rules
Merging algorithm proceeds in three steps. To merge tr1 and tr2 into trmerged:
1. Copy all constraints that are both in tr1 and tr2 into trmerged
2. Consider tr1 and tr2 separately. For the remaining constraints in tr1 and tr2 , perform all possible instances of raising value constraints to agreement constraints.
3. Repeat step 1.
October 11, 2002 AMTA 2002 27
Seeded Version Space Learning:The Search
• The Seeded Version Space algorithm itself is the repeated generalization of rules by merging
• A merge is successful if the set of sentences that can correctly be translated with the merged rule is a superset of the union of sets that can be translated with the unmerged rules, i.e. check power of rule
• Merge until no more successful merges