machine translation overview

of 52 /52
Machine Translation Overview Alon Lavie Language Technologies Institute Carnegie Mellon University LTI Immigration Course August 16, 2010

Author: jarvis

Post on 09-Jan-2016

23 views

Category:

Documents


1 download

Embed Size (px)

DESCRIPTION

Machine Translation Overview. Alon Lavie Language Technologies Institute Carnegie Mellon University LTI Immigration Course August 16, 2010. Machine Translation: History. 1946: MT is one of the first conceived applications of modern computers (A.D. Booth, Alan Turing) - PowerPoint PPT Presentation

TRANSCRIPT

  • Machine Translation Overview Alon LavieLanguage Technologies InstituteCarnegie Mellon University

    LTI Immigration CourseAugust 16, 2010

  • August 16, 2010LTI IC 2010*Machine Translation: History1946: MT is one of the first conceived applications of modern computers (A.D. Booth, Alan Turing)1954: The Georgetown Experiment Promising toy demonstrations of Russian-English MT Late 1950s and early 1960s: MT fails to scale up to real systems1966: ALPAC Report: MT recognized as an extremely difficult, AI-complete problem. Funding disappears1968: SYSTRAN founded1985: CMU Center for Machine Translation (CMT) foundedLate 1980s and early 1990s: Field dominated by rule-based approaches KBMT, KANT, Eurotra, etc.1992: Noisy Channel Statistical MT models invented by IBM researchers (Brown, Della Pietra, et al.). CANDIDEMid 1990s: First major DARPA MT Program. PANGLOSSLate 1990s: Major Speech-to-Speech MT demonstrations: C-STAR1999: JHU Summer Workshop results in GIZA2000s: Large DARPA Funding Programs TIDES and GALE2003: Och et al introduce Phrase-based SMT. PHARAOH2006: Google Translate is launched2007: Koehn et al release MOSES

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Machine Translation: Where are we today?Age of Internet and Globalization great demand for translation services and MT: Multiple official languages of UN, EU, Canada, etc.Documentation dissemination for large manufacturers (Microsoft, IBM, Intel, Apple, Caterpillar, US Steel, ALCOA, etc.)Language and translation services business sector estimated at $15 Billion worldwide in 2008 and growing at a healthy paceEconomic incentive is still primarily within a small number of language pairsSome fairly decent commercial products in the market for these language pairsPrimarily a product of rule-based systems after many years of developmentNew generation of data-driven statistical MT: Google, Language WeaverWeb-based (mostly free) MT services: Google, Babelfish, othersPervasive MT between many language pairs still non-existent, but Google is trying to change that!

    LTI IC 2010

  • August 16, 2010LTI IC 2010*How Does MT Work?All modern MT approaches are based on building translations for complete sentences by putting together smaller pieces of translationCore Questions:What are these smaller pieces of translation? Where do they come from?How does MT put these pieces together?How does the MT system pick the correct (or best) translation among many options?

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Core Challenges of MTAmbiguity and Language Divergences:Human languages are highly ambiguous, and differently in different languagesAmbiguity at all levels: lexical, syntactic, semantic, language-specific constructions and idiomsAmount of required knowledge:Translation equivalencies for vast vocabularies (several 100k words and phrases)Syntactic knowledge (how to map syntax of one language to another), plus more complex language divergences (semantic differences, constructions and idioms, etc.) How do you acquire and construct a knowledge base that big that is (even mostly) correct and consistent?

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Rule-based vs. Data-driven Approaches to MTWhat are the pieces of translation? Where do they come from?Rule-based: large-scale clean word translation lexicons, manually constructed over time by expertsData-driven: broad-coverage word and multi-word translation lexicons, learned automatically from available sentence-parallel corporaHow does MT put these pieces together?Rule-based: large collections of rules, manually developed over time by human experts, that map structures from the source to the target languageData-driven: a computer algorithm that explores millions of possible ways of putting the small pieces together, looking for the translation that statistically looks best

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Rule-based vs. Data-driven Approaches to MTHow does the MT system pick the correct (or best) translation among many options?Rule-based: Human experts encode preferences among the rules designed to prefer creation of better translationsData-driven: a variety of fitness and preference scores, many of which can be learned from available training data, are used to model a total score for each of the millions of possible translation candidates; algorithm then selects and outputs the best scoring translation

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Rule-based vs. Data-driven Approaches to MTWhy have the data-driven approaches become so popular?We can now do this!Increasing amounts of sentence-parallel data are constantly being created on the web Advances in machine learning algorithmsComputational power of todays computers can train systems on these massive amounts of data and can perform these massive search-based translation computations when translating new textsBuilding and maintaining rule-based systems is too difficult, expensive and time-consumingIn many scenarios, it actually works better!

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Statistical MT (SMT)Data-driven, most dominant approach in current MT researchProposed by IBM in early 1990s: a direct, purely statistical, model for MTEvolved from word-level translation to phrase-based translationMain Ideas:Training: statistical models of word and phrase translation equivalence are learned automatically from bilingual parallel sentences, creating a bilingual database of translationsDecoding: new sentences are translated by a program (the decoder), which matches the source words and phrases with the database of translations, and searches the space of all possible translation combinations.

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Statistical MT (SMT)Main steps in training phrase-based statistical MT:Create a sentence-aligned parallel corpusWord Alignment: train word-level alignment models (GIZA++)Phrase Extraction: extract phrase-to-phrase translation correspondences using heuristics (Moses)Minimum Error Rate Training (MERT): optimize translation system parameters on development data to achieve best translation performanceAttractive: completely automatic, no manual rules, much reduced manual laborMain drawbacks: Translation accuracy levels vary widelyEffective only with large volumes (several mega-words) of parallel textBroad domain, but domain-sensitiveViable only for limited number of language pairs!Impressive progress in last 5-10 years!

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Statistical MT:Major ChallengesCurrent approaches are too nave and direct:Good at learning word-to-word and phrase-to-phrase correspondences from dataNot good enough at learning how to combine these pieces and reorder them properly during translationLearning general rules requires much more complicated algorithms and computer processing of the dataThe space of translations that is searched often doesnt contain a perfect translationThe fitness scores that are used arent good enough to always assign better scores to the better translations we dont always find the best translation even when its there!MERT is brittle, problematic and metric-dependent!Solutions:Google solution: more and more data!Research solution: smarter algorithms and learning methods

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Rule-based vs. Data-driven MTWe thank all participants of the whole world for their comical and creative drawings; to choose the victors was not easy task!

    Click here to see work of winning European of these two months, and use it to look at what the winning of USA sent us.We thank all the participants from around the world for their designs cocasses and creative; selecting winners was not easy!

    Click here to see the artwork of winners European of these two months, and disclosure to look at what the winners of the US have been sending.Rule-basedData-driven

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Representative Example: Google Translatehttp://translate.google.com

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Google Translate

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Google Translate

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Major Sources of Translation ProblemsLexical Differences:Multiple possible translations for SL word, or difficulties expressing SL word meaning in a single TL wordStructural Differences:Syntax of SL is different than syntax of the TL: word order, sentence and constituent structureDifferences in Mappings of Syntax to Semantics:Meaning in TL is conveyed using a different syntactic structure than in the SLIdioms and Constructions

    LTI IC 2010

  • August 16, 2010LTI IC 2010*How to Tackle the Core ChallengesManual Labor: 1000s of person-years of human experts developing large word and phrase translation lexicons and translation rules. Example: Systrans RBMT systems.Lots of Parallel Data: data-driven approaches for finding word and phrase correspondences automatically from large amounts of sentence-aligned parallel texts. Example: Statistical MT systems.Learning Approaches: learn translation rules automatically from small amounts of human translated and word-aligned data. Example: AVENUEs Statistical XFER approach.Simplify the Problem: build systems that are limited-domain or constrained in other ways. Examples: CATALYST, NESPOLE!.

    LTI IC 2010

  • August 16, 2010LTI IC 2010*State-of-the-Art in MTWhat users want:General purpose (any text)High quality (human level)Fully automatic (no user intervention)We can meet any 2 of these 3 goals today, but not all three at once:FA HQ: Knowledge-Based MT (KBMT)FA GP: Corpus-Based (Example-Based) MTGP HQ: Human-in-the-loop (Post-editing)

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Types of MT Applications:Assimilation: multiple source languages, uncontrolled style/topic. General purpose MT, no semantic analysis. (GP FA or GP HQ)Dissemination: one source language, controlled style, single topic/domain. Special purpose MT, full semantic analysis. (FA HQ)Communication: Lower quality may be okay, but system robustness, real-time required.

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Mi chiamo Alon LavieMy name is Alon LavieGive-information+personal-data (name=alon_lavie)[s [vp accusative_pronoun chiamare proper_name]][s [np [possessive_pronoun name]] [vp be proper_name]]DirectTransferInterlinguaAnalysisGenerationApproaches to MT: Vaquois MT Triangle

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Knowledge-based Interlingual MTThe classic deep Artificial Intelligence approach:Analyze the source language into a detailed symbolic representation of its meaning Generate this meaning in the target languageInterlingua: one single meaning representation for all languages Nice in theory, but extremely difficult in practice:What kind of representation?What is the appropriate level of detail to represent?How to ensure that the interlingua is in fact universal?

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Interlingua versus TransferWith interlingua, need only N parsers/ generators instead of N2 transfer systems:L1L2L3L4L5L6L1L2L3L6L5L4interlingua

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Multi-Engine MTApply several MT engines to each input in parallel Create a combined translation from the individual translationsGoal is to combine strengths, and avoid weaknesses.Along all dimensions: domain limits, quality, development time/cost, run-time speed, etc.Various approaches to the problem

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Speech-to-Speech MTSpeech just makes MT (much) more difficult:Spoken language is messier False starts, filled pauses, repetitions, out-of-vocabulary wordsLack of punctuation and explicit sentence boundariesCurrent Speech technology is far from perfectNeed for speech recognition and synthesis in foreign languagesRobustness: MT quality degradation should be proportional to SR qualityTight Integration: rather than separate sequential tasks, can SR + MT be integrated in ways that improves end-to-end performance?

    LTI IC 2010

  • August 16, 2010LTI IC 2010*MT at the LTILTI originated as the Center for Machine Translation (CMT) in 1985MT continues to be a prominent sub-discipline of research with the LTIMore MT faculty than any of the other areasMore MT faculty than anywhere elseActive research on all main approaches to MT: Interlingua, Transfer, EBMT, SMTLeader in the area of speech-to-speech MTMulti-Engine MT (MEMT)MT Evaluation (METEOR)

    LTI IC 2010

  • MT Faculty at LTIAlon LavieStephan VogelRalf BrownJaime CarbonellLori LevinNoah SmithAlan BlackFlorian MetzeAlex WaibelTeruko MitamuraEric NybergAugust 16, 2010LTI IC 2010*

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Phrase-based Statistical MTWord-to-word and phrase-to-phrase translation pairs are acquired automatically from data and assigned probabilities based on a statistical modelExtracted and trained from very large amounts of sentence-aligned parallel textWord alignment algorithmsPhrase detection algorithmsTranslation model probability estimationMain approach pursued in CMU systems in the DARPA/TIDES program and now in GALEChinese-to-English and Arabic-to-EnglishMost active work is on improved word alignment, phrase extraction and advanced decoding techniquesContact Faculty: Stephan Vogel

    LTI IC 2010

  • August 16, 2010LTI IC 2010*CMU Statistical Transfer (Stat-XFER) MT ApproachIntegrate the major strengths of rule-based and statistical MT within a common syntax-based statistically-driven framework:Linguistically rich formalism that can express complex and abstract compositional transfer rulesRules can be written by human experts and also acquired automatically from dataEasy integration of morphological analyzers and generatorsWord and syntactic-phrase correspondences can be automatically acquired from parallel textSearch-based decoding from statistical MT adapted to find the best translation within syntax-driven search space: multi-feature scoring, beam-search, parameter optimization, etc.Framework suitable for both resource-rich and resource-poor language scenariosMost active work on phrase and rule acquisition from parallel data, efficient decoding, joint decoding with non-syntactic phrases, effective syntactic modeling, MT for low-resource languagesContact Faculty: Alon Lavie, Lori Levin, Bob Frederking and Jaime Carbonell

    LTI IC 2010

  • August 16, 2010LTI IC 2010*EBMTDeveloped originally for the PANGLOSS system in the early 1990sTranslation between English and SpanishGeneralized EBMT under development for the past several yearsUsed in a variety of projects in recent yearsDARPA TIDES and GALE programsDIPLOMAT and TONGUESActive research work on improving alignment and indexing, decoding from a latticeContact Faculty: Ralf Brown and Jaime Carbonell

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Speech-to-Speech MTEvolution from JANUS/C-STAR systems to NESPOLE!, LingWear, BABYLON, TRANSTACEarly 1990s: first prototype system that fully performed sp-to-sp (very limited domains)Interlingua-based, but with shallow task-oriented representations: we have single and double rooms available [give-information+availability] (room-type={single, double})Semantic Grammars for analysis and generationMultiple languages: English, German, French, Italian, Japanese, Korean, and othersPhrase-based SMT applied in Speech-to-Speech scenariosMost active work on portable speech translation on small devices: Iraqi-Arabic/English and Thai/EnglishContact Faculty: Alan Black, Stephan Vogel, Florian Metze and Alex Waibel

    LTI IC 2010

  • August 16, 2010LTI IC 2010*KBMT: KANT, KANTOO, CATALYSTDeep knowledge-based framework, with symbolic interlingua as intermediate representationSyntactic and semantic analysis into a unambiguous detailed symbolic representation of meaning using unification grammars and transformation mappersGeneration into the target language using unification grammars and transformation mappersFirst large-scale multi-lingual interlingua-based MT system deployed commercially: CATALYST at Caterpillar: high quality translation of documentation manuals for heavy equipmentLimited domains and controlled English inputMinor amounts of post-editingSome active follow-on projectsContact Faculty: Eric Nyberg and Teruko Mitamura

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Multi-Engine MTDecoding-based approach developed in recent years under DoD and DARPA funding (used in GALE)Main ideas:Treat original engines as black boxesAlign the word and phrase correspondences between the translationsBuild a collection of synthetic combinations based on the aligned words and phrasesScore the synthetic combinations based on a variety of features, including n-gram support and Language ModelParameter Tuning: Learn optimal weights using MERTSelect the top-scoring synthetic combinationArchitecture Issues: integrating workflows that produce multiple translations and then combine them with MEMTIBMs UIMA architectureContact Faculty: Alon Lavie

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Automated MT EvaluationMETEOR: Automated metric developed at CMUImproves upon BLEU metric developed by IBM and used extensively in recent yearsMain ideas:Assess the similarity between a machine-produced translation and (several) human reference translationsSimilarity is based on word-to-word matching that matches:Identical wordsMorphological variants of same word (stemming)Synonyms and paraphrasesAddress fluency/grammaticality via a direct penalty: how well-ordered is the matching of the MT output with the reference?Tunable Weights: Weights for Precision, Recall, Fragmentation are tuned for optimal correlation with human judgmentsOutcome: Much improved levels of correlation!Contact Faculty: Alon Lavie

    LTI IC 2010

  • Safaba Translation SolutionsRecent CMU commercial spin-off company in the MT areaMission: Develop and deliver advanced translation automation software solutions for the commercial translation business sectorTarget Clients: Language Services Providers (LSPs) and their enterprise clientsPrimary Service:Software-as-a-Service customized MT Technology: Develop specialized highly-scalable software for delivering high-quality client-customized Machine Translation (MT) based on a low-cost SaaS modelOther Related Services: Consulting Services: Analyze LSP/client translation processes and technologies and advise clients on effective solutions for increasing their translation automationSoftware Implementation Services: Design and implement custom translation automation solutions for LSPs and/or enterprise clientsContact Faculty: Alon LavieAugust 16, 2010*LTI IC 2010

    LTI IC 2010

  • August 16, 2010LTI IC 2010*SummaryMain challenges for current state-of-the-art MT approaches - Coverage and Accuracy:Acquiring broad-coverage high-accuracy translation lexicons (for words and phrases)learning structural mappings between languages from parallel word-aligned dataovercoming syntax-to-semantics differences and dealing with constructionsStronger Target Language ModelingNovel algorithms for model acquisition and decoding

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Questions

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Lexical DifferencesSL word has several different meanings, that translate differently into TLEx: financial bank vs. river bankLexical Gaps: SL word reflects a unique meaning that cannot be expressed by a single word in TLEx: English snub doesnt have a corresponding verb in French or GermanTL has finer distinctions than SL SL word should be translated differently in different contextsEx: English wall can be German wand (internal), mauer (external)

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Structural DifferencesSyntax of SL is different than syntax of the TL: Word order within constituents:English NPs: art adj n the big boyHebrew NPs: art n art adj ha yeled ha gadolConstituent structure:English is SVO: Subj Verb Obj I saw the manModern Arabic is VSO: Verb Subj ObjDifferent verb syntax:Verb complexes in English vs. in German I can eat the apple Ich kann den apfel essenCase marking and free constituent orderGerman and other languages that mark case: den apfel esse Ich the(acc) apple eat I(nom)

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Syntax-to-Semantics DifferencesMeaning in TL is conveyed using a different syntactic structure than in the SLChanges in verb and its argumentsPassive constructionsMotion verbs and state verbsCase creation and case absorptionMain Distinction from Structural Differences:Structural differences are mostly independent of lexical choices and their semantic meaning can be addressed by transfer rules that are syntactic in natureSyntax-to-semantic mapping differences are meaning-specific: require the presence of specific words (and meanings) in the SL

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Syntax-to-Semantics DifferencesStructure-change example:I like swimmingIch scwhimme gernI swim gladlyVerb-argument example:Jones likes the film.Le film plait Jones.(lit: the film pleases to Jones)Passive ConstructionsExample: French reflexive passives: Ces livres se lisent facilement *These books read themselves easily These books are easily read

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Idioms and ConstructionsMain Distinction: meaning of whole is not directly compositional from meaning of its sub-parts no compositional translationExamples:George is a bull in a china shopHe kicked the bucketCan you please open the window?

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Formulaic UtterancesGood night.tisbaH cala xEr waking up on good

    Romanization of Arabic from CallHome Egypt

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Analysis and GenerationMain StepsAnalysis:Morphological analysis (word-level) and POS taggingSyntactic analysis and disambiguation (produce syntactic parse-tree)Semantic analysis and disambiguation (produce symbolic frames or logical form representation)Map to language-independent InterlinguaGeneration:Generate semantic representation in TLSentence Planning: generate syntactic structure and lexical selections for conceptsSurface-form realization: generate correct forms of words

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Direct ApproachesNo intermediate stage in the translationFirst MT systems developed in the 1950s-60s (assembly code programs)Morphology, bi-lingual dictionary lookup, local reordering rulesWord-for-word, with some local word-order adjustmentsModern Approaches: Phrase-based Statistical MT (SMT)Example-based MT (EBMT)

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Statistical MT (SMT)Proposed by IBM in early 1990s: a direct, purely statistical, model for MTMost dominant approach in current MT researchEvolved from word-level translation to phrase-based translationMain Ideas:Training: statistical models of word and phrase translation equivalence are learned automatically from bilingual parallel sentences, creating a bilingual database of translationsDecoding: new sentences are translated by a program (the decoder), which matches the source words and phrases with the database of translations, and searches the space of all possible translation combinations.

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Statistical MT (SMT)Main steps in training phrase-based statistical MT:Create a sentence-aligned parallel corpusWord Alignment: train word-level alignment models (GIZA++)Phrase Extraction: extract phrase-to-phrase translation correspondences using heuristics (Pharoah)Minimum Error Rate Training (MERT): optimize translation system parameters on development data to achieve best translation performanceAttractive: completely automatic, no manual rules, much reduced manual laborMain drawbacks: Translation accuracy levels varyEffective only with large volumes (several mega-words) of parallel textBroad domain, but domain-sensitiveStill viable only for small number of language pairs!Impressive progress in last 5 years

    LTI IC 2010

  • August 16, 2010LTI IC 2010*EBMT ParadigmNew Sentence (Source)Yesterday, 200 delegates met with President Clinton.

    Matches to Source FoundYesterday, 200 delegates met behind closed doors

    Difficulties with President ClintonGestern trafen sich 200 Abgeordnete hinter verschlossenen

    Schwierigkeiten mit Praesident ClintonAlignment (Sub-sentential)Translated Sentence (Target)Gestern trafen sich 200 Abgeordnete mit Praesident Clinton.Yesterday, 200 delegates met behind closed doors

    Difficulties with President Clinton overGestern trafen sich 200 Abgeordnete hinter verschlossenen

    Schwierigkeiten mit Praesident Clinton

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Transfer ApproachesSyntactic Transfer:Analyze SL input sentence to its syntactic structure (parse tree)Transfer SL parse-tree to TL parse-tree (various formalisms for specifying mappings)Generate TL sentence from the TL parse-treeSemantic Transfer:Analyze SL input to a language-specific semantic representation (i.e., Case Frames, Logical Form)Transfer SL semantic representation to TL semantic representationGenerate syntactic structure and then surface sentence in the TL

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Transfer ApproachesMain Advantages and Disadvantages:Syntactic Transfer:No need for semantic analysis and generationSyntactic structures are general, not domain specific Less domain dependent, can handle open domainsRequires word translation lexiconSemantic Transfer:Requires deeper analysis and generation, symbolic representation of concepts and predicates difficult to construct for open or unlimited domainsCan better handle non-compositional meaning structures can be more accurateNo word translation lexicon generate in TL from symbolic concepts

    LTI IC 2010

  • August 16, 2010LTI IC 2010*The METEOR MetricExample:Reference: the Iraqi weapons are to be handed over to the army within two weeksMT output: in two weeks Iraqs weapons will give armyMatching: Ref: Iraqi weapons army two weeks MT: two weeks Iraqs weapons armyP = 5/8 =0.625 R = 5/14 = 0.357 Fmean = 10*P*R/(9P+R) = 0.3731Fragmentation: 3 frags of 5 words = (3-1)/(5-1) = 0.50Discounting factor: DF = 0.5 * (frag**3) = 0.0625Final score: Fmean * (1- DF) = 0.3731*0.9375 = 0.3498

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Synthetic Combination MEMTTwo Stage Approach:Align: Identify common words and phrases across the translations provided by the enginesDecode: search the space of synthetic combinations of words/phrases and select the highest scoring combined translationExample:announced afghan authorities on saturday reconstituted four intergovernmental committees The Afghan authorities on Saturday the formation of the four committees of government

    LTI IC 2010

  • August 16, 2010LTI IC 2010*Synthetic Combination MEMTTwo Stage Approach:Align: Identify common words and phrases across the translations provided by the enginesDecode: search the space of synthetic combinations of words/phrases and select the highest scoring combined translationExample:announced afghan authorities on saturday reconstituted four intergovernmental committees The Afghan authorities on Saturday the formation of the four committees of governmentMEMT: the afghan authorities announced on Saturday the formation of four intergovernmental committees

    LTI IC 2010

    **Demonstration is now set of KANT interlinguasDownloaded here from AMTA interlingua workshop webpage: clicking on sentences show the ILNote use of phrasal lexical units!Start with #4: simple example (with a 2-level genl. quantifier!)1: rel. clause3: conj but, coordination, another genl. quant.5: implicit rel. clause, apposition6: latter/former7: appositive VP8: quote marks13/14: complex [numbers different on frontpage and files!]21/22: all is (:OR SINGULAR PLURAL) [ambiguity packing!]