cs 712 presentation smt and morphology case markers and ...pb/cs712-2013/sami... · richer case...

76
CS 712 presentation SMT and Morphology Case markers and Morphology Interpolated Back-off for factor based MT Simple syntactic and morphological processing Presented by Samiulla S, Jayaprakash S, Subhash K.

Upload: others

Post on 25-Jul-2020

3 views

Category:

Documents


0 download

TRANSCRIPT

  • CS 712 presentation• SMT and Morphology• Case markers and Morphology• Interpolated Back-off for factor based

    MT• Simple syntactic and morphological

    processing

    Presented by Samiulla S, Jayaprakash S, Subhash K.

  • SMT and morphology

    Poor Rich

    PoorRich

    Mostly rule-based: (Transfer/Interlingua)(Shilon et al., 2010) (Hebrew - Arabic)

    Phrase based (Kohen et al., 2003)E.g., (English-Chinese)

    Factor based:Semantic and morphological factors are used

    (Ananthakrishnan et al., 2009) (English – Hindi)(Avramidis and Kohen, 2008) (English - Greek)

    Phrase based with pre-processingNormalizing the source side surface word forms

    (Zollmann et al., 2006) (Arabic - English)

  • Poor to Rich

    Finding correct morphological feature for output is a difficult task.

    Each inflected form treated as a different word by phrase based model.

    E.g., लड़का and लड़के are treated as entirely different. Source side semantics and word order can be used

    to find correct morphological structures on rich target side (Anantkrishnan et al., 2009, Avramidis and Kohen, 2008).

  • Rich to Poor

    Comparatively easier as compared to Poor-rich. But the problem of data sparsity is there. The

    unseen inflected form of a word will be treated as an OOV by phrase based model

    E.g., If the word अ छा has occured in the training corpus but अ छ has not occured, it will be treated as an OOV.

    Source side normalization is needed. (Zollmann et al., 2006) (Arabic to English)

  • Poor to Poor

    Phrase based SMT works well if both the languages are morphologically poor.

    Rich to Rich This case is difficult to handle with SMT. Mostly interingua based systems were found

    (Shilon et al., 2010) (Hebrew to Arabic).

  • Case markers and MorphologyAddressing the crux of the fluency

    problem in English-Hindi SMTBY

    Ananthakrishnan RamanathanHansraj Choudhary

    Avishek GhoshPushpak Bhattacharyya

    Presented byJayaprakash S

    Subhash KSamiulla S

  • Aim

    Accurately generating Case markers and suffixes for English-Hindi translation.

    What entity on the English side encodes the information contained in case markers and suffixes on the Hindi side?

  • Introduction

    Fundamental problems in English-Indian Language SMT are

    1. Wide syntactic divergenceIssues in word ordering in output translations.

    2. Richer case marking and suffixes of Indian languages as compared to English

    Being the Free Word Order language, Indian languages badly suffers when morph and case markers are incorrect.

  • Motivation

    Difference between English and Hindi:

    1. English follows SVO, Hindi follows SOV in general.

    2. English uses post-modifiers, wheres as Hindi uses pre-modifier.

    3. Hindi allows greater freedom in word order, identifying constituents depending on case marking.

    4. Hindi is relatively richer in morphology.

  • Motivation contd...

    Problems 1 and 2 are addressed by reordering the English sentence to Hindi order during preprocessing step.

    Here, we focus on solving the 3rd and the 4th problem.

  • Motivation (Case Markers)

    Major constituents(Sub, Obj, ..) in English are identified by their positions in the sentence.

    In Hindi, constituents can be moved around without changing the core meaning. Case markers and suffixes are used to identify the constituents.

    Example:

    राम ने रावण को मारा . (ram ne ravan ko mara) [Ramkilled Ravan]

    रावण को राम ने मारा . (Ravan ko ram ne mara) [Ram killed Ravan]

  • Motivation (Morphology)

    Oblique Case:लडके पाठशाला गये ।लडक ने शोर मचाया ।

    Future Tense:लडके पाठशाला जायगे ।

    Causative Form:लडको ने उ हे लाया ।

  • Motivation (Sparsity)

    Looking at the English-Hindi case, sparsity is the big problem.

    It is very unlikely that all words will appear with all case markers in the training corpus.

    Example:If 'लडके' appears in training data but 'लडक ' doesn't,it will be treated as an OOV.

  • Approach

    The goal is to carry the semantics and suffix information from English side to Hindi side.

    Factored model acts like a vehical for this information across languages.

    Factored model:

    p(e | f )= 1Z exp (∑ λ i hi(e , f ))

  • Factors used

    Lemma -----> Lemmaboy -----> लडक्

    Suffix + Semantics -----> Case Marker / Suffix-s + subj -----> ए

    Lemma + Suffix -----> Surface Formलडक् + ए -----> लडके

  • Factored Model

    Word

    Suffix / Case marker

    Lemma

    Word

    SemanticRelation

    Suffix

    Lemma

    OutputReordered Input

  • Semantic Relations

    The experiments have been conducted with two kinds of semantic relations.

    1. Relations from Universal Networking Language (UNL) (44 relations).

    2. Grammatical relations produced by stanford dependancy parser (55 relations).

  • Stanford Semantic graph

    SaidSaidSaid

    wasJackthathe

    hitJohn

    John said that he was hit by jack

    nsubjpass

    ccomp

    complm agent auxpass

    nsubj

  • UNL semantic graph

    SaidSaidSay

    Jackhe

    hitJohn

    John said that he was hit by jack

    obj

    agt

    agtobj

    @entry.@past

    @entry.@past

    :01

  • Experimental setup

    SRILM toolkit was used for building language model.

    Training, Tuning, Decoding done using Moses Toolkit.

    Stanford dependancy parser was used for extracting semantic relations.

    Hindi suffix seperation was done using the stemmer described in (Ananthakrishnan and Rao, 2003).

  • Evaluation criteria

    BLEU: measures the precision of n-grams with respect to the reference translations, with a brevity penalty.

    NIST: measures the precision of n-grams. This metric is a variant of BLEU, which was shown to correlate better with human judgments.

    Subjective: Human evaluators judged the fluency and adequacy, and counted the number of errors in case markers and morphology.

  • Results

    MODEL BLEU NISTBaseline (surface) 24.32 5.85

    lemma + suffix 25.16 5.87

    lemma + suffix + unl 27.79 6.05

    lemma + suffix + stanford 28.21 5.99

  • Conclusion and future work

    The marked improvement is observed in English Hindi SMT after using factored model

    The improvement is not only statistically significant but also verified using subjective evaluation

    Correctly combining small parts to form a bigger output sentence of good quality, because smaller sentences are getting better accuracy.

  • References Ananthakrishnan Ramanathan, Hansraj Choudhary, Avishek Ghosh, and

    Pushpak Bhattacharyya. 2009. Case markers and morphology: Addressing the crux of the fluency problem in English- Hindi SMT. In Proceedings of the Joint Conference of the 47th Annual Meeting of the Association for Computational Linguistics and the 4th International Joint Conference on Natural Language Processing of the Asian Federation of Natural Language Processing, pages 800–

    808, Suntec, Singapore.Association for Computational Linguistics,2009. Ananthakrishnan, R., and Rao, D., A LightweightStemmer for Hindi, Workshop

    on ComputationalLinguistics for South-Asian Languages, EACL, 2003.

    Koehn, P., and Hoang, H., Factored Translation Models, Proceedings of EMNLP, 2007.

    Marie-Catherine de Marneffe and Manning, C., Stanford Typed Dependency Manual, 2008.

  • References Philipp Koehn, Franz Josef Och, and Daniel Marcu. 2003. Statistical phrase-

    based translation. In Proceedings of HLT-NAACL 2003, pages 127–133.

    Andreas Zollmann, Ashish Venugopal, and Stephan Vogel. 2006. Bridging the inflection morphology gap for Arabic statistical machine translation. In Proceedings of the Human Language Technology Conference of the NAACL, Companion Volume: Short Papers (NAACL-Short '06). Association for Computational Linguistics, Stroudsburg, PA, USA, 201-204.

    Avramidis, E., and Koehn, P., Enriching Morphologically Poor Languages for Statistical Machine Translation, Proceedings of ACL-08: HLT, 2008.

    Reshef Shilon, Nizar Habash, Alon Lavie, and Shuly Wintner. 2010. Machine translation between Hebrew and Arabic: Needs, challenges and preliminary solutions. In Proceedings of AMTA 2010: TheNinth Conference of the Association for Machine Translation in the Americas, November.

  • Interpolated Back-off for Factored Translation Models, Philipp Koehn, Barry Haddow

    The Tenth Biennial Conference of the Association for Machine Translation in the Americas,2012

    Presented by Samiulla, Jayaprakash, Subhash, CS712, 2013

  • 1) Phrase based models (Limitation)2) Factor based models (Model & Limitation)3) Back-off4) Interpolation back-off5) Result6) Demo

    Plan

  • 1. Pure Phrase based models treat 'house' and 'houses' as completely different

    2. 'house' occurring in training data have no effect on learning translation of 'houses'

    Phrase based models

  • Solutions may be

    1) increasing the corpus size which will have all morphological variants

    2) alter the model, such a way that it learns morphological generations

    Solutions

  • increasing the corpus size

    5 → avg. word mapping to target words (synonymy and polysemy)

    10 → sentence length

    1 million → vocabulary size

    Parallel sentences required = 5 x 10 x 1 million = 50 million

    Corpus Size

    Morphologically rich languages do not stop with this count !!!

  • increasing the corpus size

    5 → avg. word mapping to target words (synonymy and polysemy)10 → sentence length1 million → vocabulary size

    Eng : 5 x 10 x 1 million = 50 million

    Phrase based models

    Morphologically rich languages do not stop with this count !!!

  • increasing the corpus size (morph richer)

    Verb is inflected by person(a), number(b), gender(c), aspect(d), tense(e), voice(f).

    approximate number mapping per word will be m = 2^(a+b+c+d+e+f) * 5

    Noun is inflected by cases similarly.

    Corpus Size

  • increasing the corpus size (morph richer)English Sentence

    (I) was being destroyed

    அழி ெகா ேத (tamil)

    azhi intu kontu iruntaen (Transliteration)

    Root voice marker tense marker aspect marker first personDestroy past tense during past progressive singular

    azhi intu kontu iruntatal (Transliteration)

    Root voice marker tense marker aspect marker feminineDestroy past tense during past progressive

    azhi ththu kontu iruntatharkal (Transliteration)

    Root active marker tense marker aspect marker

    Morphological variants

  • Phrase based models

    alter the model

    Change the phrase based model, so model can learn inflections/syntax.

    FACTOR BASED MODEL

  • Factored Model - 1/6

    1) Allows for decomposition and enrichment of phrase -based translation model

    2) Two translation steps*, one for lemma and one for morphological properties, and a generation step to produce the target surface form.

  • Factored Model- 2/6

    Word

    Morphology

    Lemma

    Word

    Lemma

    OutputInput

    Morphology

  • Factored Model- 3/6

    P(e | f )= exp Σi= 1n λi hi(e , f )

    Log Linear Model

    Includes several components

    hlm(e , f )= plm(e )= p(e1) p (e2 |e1)... p(em |em− 1)LM:

    htrans (e , f )= P (e | f )Translation:

    Translation step is decomposed in to many mapping and generation steps

  • Factored Model - 4/6

    Introduce latent variables(el, em) on English side

    Instead of summing over all derivations, approximate by taking the best

    P(es | fs)= P(es | f s , f l , f m)We know source side factors (fs,fl,fm), find es

  • Factored Model - 5/6

    Decomposing fully-factored model into three Mapping steps using chain rule

  • Factored Model - 6/6

    With independence assumptions,

    1) Mapping steps probability distributions are estimated from word aligned corpus.

    2) Generation model, is estimated from mono-lingual target side corpus.

  • What if we use a language which is poor in morphology ? If we knew that all words in vocabulary occurred sufficient number of times in corpus.

    Who will win ?

    1)Factor based 2) Phrase based

    Questions-1/3

  • Questions-2/3

    Word

    Morphology

    Lemma

    Word

    Lemma

    Morphology

    Word

    Morphology

    Lemma

    Word

    Lemma

    OutputInput

    Morphology

    Surface

    Decomposed Factors

    Will the independent assumptions in factored model harm anything,anywhere?

  • If we are using phrase based model, phrase-table have many translation options for each source phrase.

    Is there any way to find out, the model learned enough about a particular phrase ?

    Questions-3/3

  • Analysis

    how big is the portion of rare words in the test set and do they get translated significantly worse?

    Precision by count

  • Balance between

    1) traditional surface form translation models

    2) factored models that decompose translation into lemma and morphological feature mapping steps.

    Interpolated back-off

  • Pure phrase based modelscause sparse data problems in model estimation, affecting

    both the translation model and the language model

    Factor based models

    there may be harm due to the independence assumptions

    Motivation

  • Backoff models rely primarily on surface translation model but back off to the decomposed model for unknown word forms.

    Interpolated backoff models combine surface and factored translation models, relying more heavily on the surface models for frequent words, and more heavily on the factored models for the rare words.

    Motivation

  • Backoff

    1) Primarily relies on the phrase-based model

    2) Only for unknown words and phrases, factored model is used for possible translations

    3) We may introduce a third model that relies on synonyms to increase coverage.

  • Interpolated Back-off

    1) Back-off model uses factored model for unknown surface forms.

    2)Back-off model does not change predictions for rare surface forms ( seen once or twice ) and factored model play no role here.

  • Interpolated Back-off

    1) subtract some of the probability mass from translations e in the primary distribution p1(e|f ) and use it for additional translations from the secondary distribution p2(e|f)

    2) obtain α(e|f ) by absolute discounting,subtract a fixed number D from each count

  • Interpolated Back-off

    Example :

    f(eng) e(Tamil) Count P1(e/f) α(e|f )House - Veedu(house) - 5 - 0.72- 0.64

    Veettai(house-acc) - 1- 0.14- 0.07Veettin(of house) - 1 - 0.14- 0.07

    D=0.5

    (1-0.64-0.07-0.07) = 0.23 weight given to factored model P2(e/f)

  • Experiments

    1) Training data : European Parliament proceedingsand collected news commentaries

    2) The test set : collection of news stories

    3) LM: 5-gram Lemma, 7-gram morphology sequence

    4) Word Alignment: on lemma, instead of surface forms

  • Results

  • Results

    1. A plain surface phrase-based models that usesonly surface forms

    2. A joint factored models that translates all factors (surface, lemma, morphology) in one translation step

    3. A back-off model from the joint phrase-based model to the decomposed model

    4. An interpolated back-off model version of above

    5. A lemma back-off model from the joint phrase-based model to a model that maps from source lemma into all target factors

    6. An interpolated back-off version of above

  • Analysis

    1) Quadratmeter means square occurred 3 times correctly translated by interpolated back off but not by back-off.

    2) German word Gewalten was translated incorrectly into violence by the interpolated back-off model, while the simple back-off model arrived at the right translation powers. The word occurred only three times in the corpus with the acceptable translations powers, forces, and branches,but its singular form Gewalt is very frequent and almost always translates into violence.

  • Conclusion

    backoff methods for the better translation of rare words by combining surface word translation with translations obtained from a decomposed factored model.

    gains in BLEU and improved translation accuracy

  • Demo

    Decomposedfactors

    Surface

  • References

    Bojar, O. and Kos, K. (2010). 2010 failures in english-czech phrase-based mt. In Proceedings of the Joint Fifth Workshop on Statistical Machine Translation and MetricsMATR, pages 60–66, Uppsala, Sweden. Association for Computational Linguistics.

    Callison-Burch, C., Koehn, P., Monz, C., and Zaidan, O.(2011). Findings of the 2011 workshop on statistical machine translation. In Proceedings of the Sixth Workshop on Statistical Machine Translation, pages 22–64,Edinburgh, Scotland. Association for Computational Linguistics.

    Chen, S. F. and Goodman, J. (1998). An emprirical study of smoothing techniques for language modeling. Technical Report TR-10-98, Computer Science Group, Harvard University.

    Collins, M., Koehn, P., and Kucerova, I. (2005). Clause restructuring for statistical machine translation. In Proceedings of the 43rd Annual Meeting of the Association for Computational Linguistics (ACL’05), pages 531–540, Ann Arbor, Michigan.

  • Ananthakrishnan RamanathanPushpak Bhattacharyya

    Indian Institute of Technology

    Jayprasad Hegde Ritesh M. Shah

    Sasikumar MCDAC Mumbai

    Simple Syntactic and Morphological Processing Can Help English-Hindi

    Statistical Machine Translation

  • Motivation Proposed solution Syntactic processing Morphological processing System overview Experimental evaluation Results Conclusion

    Road Map

  • Indian Languages, Differ in terms of word order from English

    Cost of reordering, not so good translation Morphologically rich

    Parallel corpus should cover a large number of word forms Unavailability of large amounts of parallel corpora

    SMTEnglish ह द

    How to achieve reasonable improvement?

    Motivation

  • Reorder the English sentence as per Hindi syntax Applying transformation rules on the English

    parse tree

    Making use of suffixes of Hindi words. Using simple suffix separation

    Proposed solution

  • Syntactic reordering performed as a preprocessing stage

    Transformation rule

    1.SVO order converted to SOV2.Post modifiers are converted to premodifiers

    Syntactic processing

  • Example

    Syntactic processing

  • Not considering different morphological forms of a word as independent entities. Dealing with data sparsity Results in alignment of morphs instead of wordforms

    Consider that the training corpus contains only one instance of “players”,

    English: Players should just play.Hindi: खला ड़य को केवल खेलना चा हएkhilaadiyom ko keval khelanaa caahie

    Morphological processing

  • Consider the input sentence, English: The men came across some players Expected Hindi translationआद मय को कुछ खलाडी मलेAadmiyon ko kuch khiladii mile If morphology is not used , the system will

    choose खला ड़य for “players”

    Morphological processing

  • Morphological analyzer for English(Minnen et al., 2001)

    Suffix separation program for Hindi (Ananthakrishnan and Rao, 2003)

    Extracts longest possible suffix(shown in table) for the words

    Tools for Morphological information extraction

  • Ananthakrishnan, R., Bhattacharyya, P., Hegde, J. J., Shah, R. M., and Sasikumar, M., Simple Syntactic and Morphological ProcessingCan Help English-Hindi Statistical Machine Translation, Proceedings of IJCNLP, 2008

    System overview

  • Corpus

    BLEU, mWER and SSER are used to evaluate

    #sentences #wordsTraining 5000 120153Development 483 11675Test 400 8557Monolingual(Hindi)

    49937 1123966

    Experimental evaluation

  • mWER(Multi Reference Word Error Rate) –edit distance with the most similar reference translation

    SSER(Subjective sentence error rate) – based on human judgement 0 – non sense 1 – Roughly understandable 2 – understandable 3 - Good 4 - Perfect Lower the SSER Better the translation

    Sonja Nießen, Franz Josef Och, Gregor Leusch, and Hermann Ney,An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research,International Conference on Language Resources and Evaluation, pages 39–45, 2000.

    Evaluation

  • SSER `

    v(s,t) is the score assigned to traslation t and sentence s

    n is the number of translation pairs

    K is the number of evaluation classes

    Evaluation

  • Base line

    Phrase based model (Koehn et. Al, 2003)

    Where f is the source sentence and e is the translation. is the translation having highest probability

    Which can be rewritten using Bayes' decision rule as,

    ê= argmaxe

    p(e | f )

    ê= argmaxe

    p(e) p( f | e)

    Translation model p(f|e) is computed using a phrase translation probability distribution

  • Base line: Phrase translation table

    ϕ ( ̄ f | ̄e)=count ( ̄ f , ̄e )

    ∑f

    cout ( f , ̄e)

    The parallel corpus is word aligned.Phrase correspondences are foundGiven the set of phrase pairs , phrase translation probability is

    Language model p(e) used is a trigram model with modified Kneser- Ney smoothing

    ( ̄ f , ̄e)

  • Technique Evaluation Metric

    BLEU mWER SSER Roughly understandable

    Understandable

    baseline(phrase based translation)(Koehn et. Al, 2003)

    12.10 77.49 91.20 10% 0%

    baseline+syn 16.90 69.18 74.40 42% 12%

    baseline+syn+morph

    15.88 70.69 66.40 46% 28%

    Results

  • Incorporating syntactic and morphological information increases the translation quality

    Useful for English to any Indian language translation

    Conclusion

  • Ananthakrishnan, R., Bhattacharyya, P., Hegde, J. J., Shah, R. M., and Sasikumar, M., Simple Syntactic and Morphological Processing Can Help English-Hindi Statistical Machine Translation, Proceedings of IJCNLP, 2008.

    Philip Koehn, Franz Josef Och, and Daniel Marcu,Statistical Phrase-based Translation, Proceedings of HLT-NAACL, 2003

    Sonja Nießen, Franz Josef Och, Gregor Leusch, and Hermann Ney, An Evaluation Tool for Machine Translation: Fast Evaluation for MT Research,International Conference on Language Resources and Evaluation, pages 39–45, 2000.

    References