identification of fertile translations in comparable corpora: a morpho-compositional approach

141
Identification of Fertile Translations in Comparable Corpora a Morpho-Compositional Approach Estelle Delpech 1 , B´ eatrice Daille 1 , Emmanuel Morin 1 , Claire Lemaire 2,3 1 LINA, Universit´ e de Nantes 2 GREMUTS, Universit´ e de Grenoble 3 Lingua et Machina AMTA’12 10/31/12 San Diego, CA

Upload: estelle-delpech

Post on 10-May-2015

608 views

Category:

Technology


0 download

DESCRIPTION

Material presented at the Tenth Biennial Conference of the Association for Machine Translation in the Americas (AMTA 2012), San Diego, CA. Download paper at http://hal.archives-ouvertes.fr/hal-00730325. Instiutions: Laboratoire d'Informatique de Nantes Atlantique (LINA), Lingua et Machina, Gremuts

TRANSCRIPT

Page 1: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Identification of Fertile Translations in ComparableCorpora

a Morpho-Compositional Approach

Estelle Delpech1, Beatrice Daille1, Emmanuel Morin1, ClaireLemaire2,3

1LINA, Universite de Nantes 2GREMUTS, Universite de Grenoble3Lingua et Machina

AMTA’12 10/31/12 San Diego, CA

Page 2: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Outline

1 Context and original problem

2 Compositional translation framework

3 Detailed translation method

4 Experiments and results

5 Future work

Page 3: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Outline

1 Context and original problem

2 Compositional translation framework

3 Detailed translation method

4 Experiments and results

5 Future work

Page 4: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Original problemComparable corporaVariation in translation

Context

Research partly funded by Computer-Aided Translationcompany

Goal: generate domain-specific bilingual lexicons when noparallel data is available

Available data:I general language bilingual dictionaryI domain-specific comparable corpora

1 / 28

Page 5: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Original problemComparable corporaVariation in translation

Context

Research partly funded by Computer-Aided Translationcompany

Goal: generate domain-specific bilingual lexicons when noparallel data is available

Available data:I general language bilingual dictionaryI domain-specific comparable corpora

1 / 28

Page 6: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Original problemComparable corporaVariation in translation

Context

Research partly funded by Computer-Aided Translationcompany

Goal: generate domain-specific bilingual lexicons when noparallel data is available

Available data:I general language bilingual dictionaryI domain-specific comparable corpora

1 / 28

Page 7: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Original problemComparable corporaVariation in translation

Context

Research partly funded by Computer-Aided Translationcompany

Goal: generate domain-specific bilingual lexicons when noparallel data is available

Available data:I general language bilingual dictionaryI domain-specific comparable corpora

1 / 28

Page 8: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Original problemComparable corporaVariation in translation

Comparable corpora

Definition of comparable corpora

Set of texts in languages L1 and L2, which are not translations,but which deal with the same subject matter, so that there is still apossibility to extract translation pairs

Some difficulties:

I language in target texts is not influenced by source textsI mixed text types : technical, scientific, lay science...

⇒ do not expect parallelism in source ↔ target structures

⇒ need to deal with variation in translation

2 / 28

Page 9: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Original problemComparable corporaVariation in translation

Comparable corpora

Definition of comparable corpora

Set of texts in languages L1 and L2, which are not translations,but which deal with the same subject matter, so that there is still apossibility to extract translation pairs

Some difficulties:

I language in target texts is not influenced by source textsI mixed text types : technical, scientific, lay science...

⇒ do not expect parallelism in source ↔ target structures

⇒ need to deal with variation in translation

2 / 28

Page 10: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Original problemComparable corporaVariation in translation

Comparable corpora

Definition of comparable corpora

Set of texts in languages L1 and L2, which are not translations,but which deal with the same subject matter, so that there is still apossibility to extract translation pairs

Some difficulties:I language in target texts is not influenced by source texts

I mixed text types : technical, scientific, lay science...

⇒ do not expect parallelism in source ↔ target structures

⇒ need to deal with variation in translation

2 / 28

Page 11: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Original problemComparable corporaVariation in translation

Comparable corpora

Definition of comparable corpora

Set of texts in languages L1 and L2, which are not translations,but which deal with the same subject matter, so that there is still apossibility to extract translation pairs

Some difficulties:I language in target texts is not influenced by source textsI mixed text types : technical, scientific, lay science...

⇒ do not expect parallelism in source ↔ target structures

⇒ need to deal with variation in translation

2 / 28

Page 12: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Original problemComparable corporaVariation in translation

Comparable corpora

Definition of comparable corpora

Set of texts in languages L1 and L2, which are not translations,but which deal with the same subject matter, so that there is still apossibility to extract translation pairs

Some difficulties:I language in target texts is not influenced by source textsI mixed text types : technical, scientific, lay science...

⇒ do not expect parallelism in source ↔ target structures

⇒ need to deal with variation in translation

2 / 28

Page 13: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Original problemComparable corporaVariation in translation

Comparable corpora

Definition of comparable corpora

Set of texts in languages L1 and L2, which are not translations,but which deal with the same subject matter, so that there is still apossibility to extract translation pairs

Some difficulties:I language in target texts is not influenced by source textsI mixed text types : technical, scientific, lay science...

⇒ do not expect parallelism in source ↔ target structures

⇒ need to deal with variation in translation

2 / 28

Page 14: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Original problemComparable corporaVariation in translation

Variation in translation

Morphological variation:I anticancer (Noun) → anticancereux (Adj) ’anticancerous’

⇒ use of morphological derivation rules / lexicons

Lexical variation:I radiosensitivity → Radiotoleranz ’radiotolerance’

sensitivity ≈ tolerance

⇒ use of synonyms, thesaurus

Fertility:I bi-dimensional → deux dimensions ’two dimensions’

⇒ scarcely adressed

3 / 28

Page 15: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Original problemComparable corporaVariation in translation

Variation in translation

Morphological variation:I anticancer (Noun) → anticancereux (Adj) ’anticancerous’

⇒ use of morphological derivation rules / lexicons

Lexical variation:I radiosensitivity → Radiotoleranz ’radiotolerance’

sensitivity ≈ tolerance

⇒ use of synonyms, thesaurus

Fertility:I bi-dimensional → deux dimensions ’two dimensions’

⇒ scarcely adressed

3 / 28

Page 16: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Original problemComparable corporaVariation in translation

Variation in translation

Morphological variation:I anticancer (Noun) → anticancereux (Adj) ’anticancerous’⇒ use of morphological derivation rules / lexicons

Lexical variation:I radiosensitivity → Radiotoleranz ’radiotolerance’

sensitivity ≈ tolerance

⇒ use of synonyms, thesaurus

Fertility:I bi-dimensional → deux dimensions ’two dimensions’

⇒ scarcely adressed

3 / 28

Page 17: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Original problemComparable corporaVariation in translation

Variation in translation

Morphological variation:I anticancer (Noun) → anticancereux (Adj) ’anticancerous’⇒ use of morphological derivation rules / lexicons

Lexical variation:I radiosensitivity → Radiotoleranz ’radiotolerance’

sensitivity ≈ tolerance

⇒ use of synonyms, thesaurus

Fertility:I bi-dimensional → deux dimensions ’two dimensions’

⇒ scarcely adressed

3 / 28

Page 18: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Original problemComparable corporaVariation in translation

Variation in translation

Morphological variation:I anticancer (Noun) → anticancereux (Adj) ’anticancerous’⇒ use of morphological derivation rules / lexicons

Lexical variation:I radiosensitivity → Radiotoleranz ’radiotolerance’

sensitivity ≈ tolerance⇒ use of synonyms, thesaurus

Fertility:I bi-dimensional → deux dimensions ’two dimensions’

⇒ scarcely adressed

3 / 28

Page 19: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Original problemComparable corporaVariation in translation

Variation in translation

Morphological variation:I anticancer (Noun) → anticancereux (Adj) ’anticancerous’⇒ use of morphological derivation rules / lexicons

Lexical variation:I radiosensitivity → Radiotoleranz ’radiotolerance’

sensitivity ≈ tolerance⇒ use of synonyms, thesaurus

Fertility:I bi-dimensional → deux dimensions ’two dimensions’

⇒ scarcely adressed

3 / 28

Page 20: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Original problemComparable corporaVariation in translation

Variation in translation

Morphological variation:I anticancer (Noun) → anticancereux (Adj) ’anticancerous’⇒ use of morphological derivation rules / lexicons

Lexical variation:I radiosensitivity → Radiotoleranz ’radiotolerance’

sensitivity ≈ tolerance⇒ use of synonyms, thesaurus

Fertility:I bi-dimensional → deux dimensions ’two dimensions’⇒ scarcely adressed

3 / 28

Page 21: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Original problemComparable corporaVariation in translation

Fertility

Definition target term has more words than the source term

semantic fertility target term has more morphemes than the sourcetermvoie de glace ’route of ice’ → ice climbing routeaquarelle (not decomposable) → water color

surface fertility target and source terms have the same number ofmorphemesbi-dimensional → deux dimensions ’two dimensions’

4 / 28

Page 22: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Original problemComparable corporaVariation in translation

Fertility

Definition target term has more words than the source term

semantic fertility target term has more morphemes than the sourcetermvoie de glace ’route of ice’ → ice climbing routeaquarelle (not decomposable) → water color

surface fertility target and source terms have the same number ofmorphemesbi-dimensional → deux dimensions ’two dimensions’

4 / 28

Page 23: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Original problemComparable corporaVariation in translation

Fertility

Definition target term has more words than the source term

semantic fertility target term has more morphemes than the sourcetermvoie de glace ’route of ice’ → ice climbing routeaquarelle (not decomposable) → water color

surface fertility target and source terms have the same number ofmorphemesbi-dimensional → deux dimensions ’two dimensions’

4 / 28

Page 24: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Original problemComparable corporaVariation in translation

Fertility

Definition target term has more words than the source term

semantic fertility target term has more morphemes than the sourcetermvoie de glace ’route of ice’ → ice climbing routeaquarelle (not decomposable) → water color

surface fertility target and source terms have the same number ofmorphemesbi-dimensional → deux dimensions ’two dimensions’

4 / 28

Page 25: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Outline

1 Context and original problem

2 Compositional translation framework

3 Detailed translation method

4 Experiments and results

5 Future work

Page 26: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Compositional translation

Principle of compositionality

“the meaning of the whole is a function of the meaning of theparts” [Keenan and Faltz, 1985, 24-25]

Definition of compositional translation

The translation of the whole is a function of the translation of theparts

5 / 28

Page 27: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Compositional translation

Principle of compositionality

“the meaning of the whole is a function of the meaning of theparts” [Keenan and Faltz, 1985, 24-25]

Definition of compositional translation

The translation of the whole is a function of the translation of theparts

5 / 28

Page 28: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Compositional translation process

1 DecompositionI “cytotoxic” → {cyto, toxic}

2 TranslationI {cyto, toxic} → {cyto, toxique}

3 RecompositionI {cyto, toxique} → {cytotoxique, toxiquecyto}

4 SelectionI {cytotoxique, toxiquecyto} → “cytotoxique”

6 / 28

Page 29: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Compositional translation process

1 DecompositionI “cytotoxic” → {cyto, toxic}

2 TranslationI {cyto, toxic} → {cyto, toxique}

3 RecompositionI {cyto, toxique} → {cytotoxique, toxiquecyto}

4 SelectionI {cytotoxique, toxiquecyto} → “cytotoxique”

6 / 28

Page 30: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Compositional translation process

1 DecompositionI “cytotoxic” → {cyto, toxic}

2 TranslationI {cyto, toxic} → {cyto, toxique}

3 RecompositionI {cyto, toxique} → {cytotoxique, toxiquecyto}

4 SelectionI {cytotoxique, toxiquecyto} → “cytotoxique”

6 / 28

Page 31: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Compositional translation process

1 DecompositionI “cytotoxic” → {cyto, toxic}

2 TranslationI {cyto, toxic} → {cyto, toxique}

3 RecompositionI {cyto, toxique} → {cytotoxique, toxiquecyto}

4 SelectionI {cytotoxique, toxiquecyto} → “cytotoxique”

6 / 28

Page 32: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Compositional translation process

1 DecompositionI “cytotoxic” → {cyto, toxic}

2 TranslationI {cyto, toxic} → {cyto, toxique}

3 RecompositionI {cyto, toxique} → {cytotoxique, toxiquecyto}

4 SelectionI {cytotoxique, toxiquecyto} → “cytotoxique”

6 / 28

Page 33: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Relevance of compositional translation

More than 60% of terms in technical and scientific domainsare morphologically complex [Namer and Baud, 2007]

Outperforms distributional approach for the translation ofterms with compositional meaning [Morin and Daille, 2009]

7 / 28

Page 34: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Relevance of compositional translation

More than 60% of terms in technical and scientific domainsare morphologically complex [Namer and Baud, 2007]

Outperforms distributional approach for the translation ofterms with compositional meaning [Morin and Daille, 2009]

7 / 28

Page 35: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Relevance of compositional translation

More than 60% of terms in technical and scientific domainsare morphologically complex [Namer and Baud, 2007]

Outperforms distributional approach for the translation ofterms with compositional meaning [Morin and Daille, 2009]

7 / 28

Page 36: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Related work: single-word terms translation

[Cartoni, 2009]

Prefixed word → prefixed word

I ri+organizzare → re+organiser ’reorganize’

[Harastani et al., 2012]

Neoclassical compound → neoclassical compound

I Kalori+metrie → calori+metrie ’calorimetry’

[Weller et al., 2011]

Noun compound → noun phrase

I Elektronen+mikroskop →electron microscope

⇒ Restricted to a small set of source-to-target structures

⇒ Fertility handled in the specific case of noun compounds

8 / 28

Page 37: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Related work: single-word terms translation

[Cartoni, 2009]

Prefixed word → prefixed word

I ri+organizzare → re+organiser ’reorganize’

[Harastani et al., 2012]

Neoclassical compound → neoclassical compound

I Kalori+metrie → calori+metrie ’calorimetry’

[Weller et al., 2011]

Noun compound → noun phrase

I Elektronen+mikroskop →electron microscope

⇒ Restricted to a small set of source-to-target structures

⇒ Fertility handled in the specific case of noun compounds

8 / 28

Page 38: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Related work: single-word terms translation

[Cartoni, 2009]

Prefixed word → prefixed wordI ri+organizzare → re+organiser ’reorganize’

[Harastani et al., 2012]

Neoclassical compound → neoclassical compound

I Kalori+metrie → calori+metrie ’calorimetry’

[Weller et al., 2011]

Noun compound → noun phrase

I Elektronen+mikroskop →electron microscope

⇒ Restricted to a small set of source-to-target structures

⇒ Fertility handled in the specific case of noun compounds

8 / 28

Page 39: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Related work: single-word terms translation

[Cartoni, 2009]

Prefixed word → prefixed wordI ri+organizzare → re+organiser ’reorganize’

[Harastani et al., 2012]

Neoclassical compound → neoclassical compound

I Kalori+metrie → calori+metrie ’calorimetry’

[Weller et al., 2011]

Noun compound → noun phrase

I Elektronen+mikroskop →electron microscope

⇒ Restricted to a small set of source-to-target structures

⇒ Fertility handled in the specific case of noun compounds

8 / 28

Page 40: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Related work: single-word terms translation

[Cartoni, 2009]

Prefixed word → prefixed wordI ri+organizzare → re+organiser ’reorganize’

[Harastani et al., 2012]

Neoclassical compound → neoclassical compound

I Kalori+metrie → calori+metrie ’calorimetry’

[Weller et al., 2011]

Noun compound → noun phrase

I Elektronen+mikroskop →electron microscope

⇒ Restricted to a small set of source-to-target structures

⇒ Fertility handled in the specific case of noun compounds

8 / 28

Page 41: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Related work: single-word terms translation

[Cartoni, 2009]

Prefixed word → prefixed wordI ri+organizzare → re+organiser ’reorganize’

[Harastani et al., 2012]

Neoclassical compound → neoclassical compoundI Kalori+metrie → calori+metrie ’calorimetry’

[Weller et al., 2011]

Noun compound → noun phrase

I Elektronen+mikroskop →electron microscope

⇒ Restricted to a small set of source-to-target structures

⇒ Fertility handled in the specific case of noun compounds

8 / 28

Page 42: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Related work: single-word terms translation

[Cartoni, 2009]

Prefixed word → prefixed wordI ri+organizzare → re+organiser ’reorganize’

[Harastani et al., 2012]

Neoclassical compound → neoclassical compoundI Kalori+metrie → calori+metrie ’calorimetry’

[Weller et al., 2011]

Noun compound → noun phrase

I Elektronen+mikroskop →electron microscope

⇒ Restricted to a small set of source-to-target structures

⇒ Fertility handled in the specific case of noun compounds

8 / 28

Page 43: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Related work: single-word terms translation

[Cartoni, 2009]

Prefixed word → prefixed wordI ri+organizzare → re+organiser ’reorganize’

[Harastani et al., 2012]

Neoclassical compound → neoclassical compoundI Kalori+metrie → calori+metrie ’calorimetry’

[Weller et al., 2011]

Noun compound → noun phrase

I Elektronen+mikroskop →electron microscope

⇒ Restricted to a small set of source-to-target structures

⇒ Fertility handled in the specific case of noun compounds

8 / 28

Page 44: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Related work: single-word terms translation

[Cartoni, 2009]

Prefixed word → prefixed wordI ri+organizzare → re+organiser ’reorganize’

[Harastani et al., 2012]

Neoclassical compound → neoclassical compoundI Kalori+metrie → calori+metrie ’calorimetry’

[Weller et al., 2011]

Noun compound → noun phraseI Elektronen+mikroskop →electron microscope

⇒ Restricted to a small set of source-to-target structures

⇒ Fertility handled in the specific case of noun compounds

8 / 28

Page 45: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Related work: single-word terms translation

[Cartoni, 2009]

Prefixed word → prefixed wordI ri+organizzare → re+organiser ’reorganize’

[Harastani et al., 2012]

Neoclassical compound → neoclassical compoundI Kalori+metrie → calori+metrie ’calorimetry’

[Weller et al., 2011]

Noun compound → noun phraseI Elektronen+mikroskop →electron microscope

⇒ Restricted to a small set of source-to-target structures

⇒ Fertility handled in the specific case of noun compounds

8 / 28

Page 46: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Related work: single-word terms translation

[Cartoni, 2009]

Prefixed word → prefixed wordI ri+organizzare → re+organiser ’reorganize’

[Harastani et al., 2012]

Neoclassical compound → neoclassical compoundI Kalori+metrie → calori+metrie ’calorimetry’

[Weller et al., 2011]

Noun compound → noun phraseI Elektronen+mikroskop →electron microscope

⇒ Restricted to a small set of source-to-target structures

⇒ Fertility handled in the specific case of noun compounds

8 / 28

Page 47: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Contribution I

Addressing fertility by allowing translation equivalences frombound morpheme to autonomous lexical item:

I cyto → cellule ’cell’I cytotoxic → toxique (pour les) cellules ’toxic to the cells’

9 / 28

Page 48: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Contribution I

Addressing fertility by allowing translation equivalences frombound morpheme to autonomous lexical item:

I cyto → cellule ’cell’I cytotoxic → toxique (pour les) cellules ’toxic to the cells’

9 / 28

Page 49: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Contribution I

Addressing fertility by allowing translation equivalences frombound morpheme to autonomous lexical item:

I cyto → cellule ’cell’

I cytotoxic → toxique (pour les) cellules ’toxic to the cells’

9 / 28

Page 50: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Contribution I

Addressing fertility by allowing translation equivalences frombound morpheme to autonomous lexical item:

I cyto → cellule ’cell’I cytotoxic → toxique (pour les) cellules ’toxic to the cells’

9 / 28

Page 51: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Contribution II

Larger variety of input/output structures:

SOURCE TARGET

prefixed wordneoclassical compoundsuffixed wordcompoundany combination

=⇒

prefixed wordneoclassical compoundsuffixed wordcompoundany combinationphrase

10 / 28

Page 52: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Contribution II

Larger variety of input/output structures:

SOURCE TARGET

prefixed wordneoclassical compoundsuffixed wordcompoundany combination

=⇒

prefixed wordneoclassical compoundsuffixed wordcompoundany combinationphrase

10 / 28

Page 53: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Contribution II

Larger variety of input/output structures:

SOURCE TARGET

prefixed wordneoclassical compoundsuffixed wordcompoundany combination

=⇒

prefixed wordneoclassical compoundsuffixed wordcompoundany combinationphrase

10 / 28

Page 54: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Underlying principle and advantagesRelated workContribution

Contribution II

Larger variety of input/output structures:

SOURCE TARGET

prefixed wordneoclassical compoundsuffixed wordcompoundany combination

=⇒

prefixed wordneoclassical compoundsuffixed wordcompoundany combinationphrase

10 / 28

Page 55: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Outline

1 Context and original problem

2 Compositional translation framework

3 Detailed translation method

4 Experiments and results

5 Future work

Page 56: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Overview

1 DecompositionI lexicons + heuristic rules

2 TranslationI dictionary look-up

3 RecompositionI permutations

4 SelectionI search occurrences in target texts

11 / 28

Page 57: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Overview

1 DecompositionI lexicons + heuristic rules

2 TranslationI dictionary look-up

3 RecompositionI permutations

4 SelectionI search occurrences in target texts

11 / 28

Page 58: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Overview

1 DecompositionI lexicons + heuristic rules

2 TranslationI dictionary look-up

3 RecompositionI permutations

4 SelectionI search occurrences in target texts

11 / 28

Page 59: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Overview

1 DecompositionI lexicons + heuristic rules

2 TranslationI dictionary look-up

3 RecompositionI permutations

4 SelectionI search occurrences in target texts

11 / 28

Page 60: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Overview

1 DecompositionI lexicons + heuristic rules

2 TranslationI dictionary look-up

3 RecompositionI permutations

4 SelectionI search occurrences in target texts

11 / 28

Page 61: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Decomposition - step 1

Split source term into minimal components with heuristic rules:

split on hyphens

match substrings of the source term with:I a list of morphemes (prefixes, confixes, suffixes)I a list of lexical items

respect some length constraints on the substrings

non-cytotoxic → {non, cyto, toxic}

12 / 28

Page 62: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Decomposition - step 1

Split source term into minimal components with heuristic rules:

split on hyphens

match substrings of the source term with:I a list of morphemes (prefixes, confixes, suffixes)I a list of lexical items

respect some length constraints on the substrings

non-cytotoxic → {non, cyto, toxic}

12 / 28

Page 63: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Decomposition - step 1

Split source term into minimal components with heuristic rules:

split on hyphens

match substrings of the source term with:I a list of morphemes (prefixes, confixes, suffixes)I a list of lexical items

respect some length constraints on the substrings

non-cytotoxic → {non, cyto, toxic}

12 / 28

Page 64: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Decomposition - step 1

Split source term into minimal components with heuristic rules:

split on hyphens

match substrings of the source term with:I a list of morphemes (prefixes, confixes, suffixes)I a list of lexical items

respect some length constraints on the substrings

non-cytotoxic → {non, cyto, toxic}

12 / 28

Page 65: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Decomposition - step 1

Split source term into minimal components with heuristic rules:

split on hyphens

match substrings of the source term with:I a list of morphemes (prefixes, confixes, suffixes)I a list of lexical items

respect some length constraints on the substrings

non-cytotoxic → {non, cyto, toxic}

12 / 28

Page 66: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Decomposition - step 1

Split source term into minimal components with heuristic rules:

split on hyphens

match substrings of the source term with:I a list of morphemes (prefixes, confixes, suffixes)I a list of lexical items

respect some length constraints on the substrings

non-cytotoxic → {non, cyto, toxic}

12 / 28

Page 67: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Decomposition - step 2

Generate all possible concatenations of the minimalcomponents:

{ non, cyto, toxic} → {non, cyto, toxic},{noncyto, toxic}, {non, cytotoxic},{noncytotoxic}

⇒ Increases the chances of matching the components withentries of the dictionaries

13 / 28

Page 68: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Decomposition - step 2

Generate all possible concatenations of the minimalcomponents:

{ non, cyto, toxic} → {non, cyto, toxic},{noncyto, toxic}, {non, cytotoxic},{noncytotoxic}

⇒ Increases the chances of matching the components withentries of the dictionaries

13 / 28

Page 69: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Decomposition - step 2

Generate all possible concatenations of the minimalcomponents:

{ non, cyto, toxic} → {non, cyto, toxic},{noncyto, toxic}, {non, cytotoxic},{noncytotoxic}

⇒ Increases the chances of matching the components withentries of the dictionaries

13 / 28

Page 70: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Decomposition - step 2

Generate all possible concatenations of the minimalcomponents:

{ non, cyto, toxic} → {non, cyto, toxic},{noncyto, toxic}, {non, cytotoxic},{noncytotoxic}

⇒ Increases the chances of matching the components withentries of the dictionaries

13 / 28

Page 71: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Translation through direct dictionary look-up

Bilingual dictionary for lexical items:I toxic → toxique

Morpheme translation table for bound morphemes:I -cyto- → -cyto-, cellule

{-cyto-, toxic} → {-cyto-, toxique},{cellule, toxique}

14 / 28

Page 72: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Translation through direct dictionary look-up

Bilingual dictionary for lexical items:I toxic → toxique

Morpheme translation table for bound morphemes:I -cyto- → -cyto-, cellule

{-cyto-, toxic} → {-cyto-, toxique},{cellule, toxique}

14 / 28

Page 73: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Translation through direct dictionary look-up

Bilingual dictionary for lexical items:I toxic → toxique

Morpheme translation table for bound morphemes:I -cyto- → -cyto-, cellule

{-cyto-, toxic} → {-cyto-, toxique},{cellule, toxique}

14 / 28

Page 74: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Translation through direct dictionary look-up

Bilingual dictionary for lexical items:I toxic → toxique

Morpheme translation table for bound morphemes:I -cyto- → -cyto-, cellule

{-cyto-, toxic} → {-cyto-, toxique},{cellule, toxique}

14 / 28

Page 75: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Translation with variation

Morphological lexiconI toxique → toxicite ’toxicity’

SynonymsI toxique → veneneux ’poisonous’

{-cyto-, toxic} → {-cyto-, toxicite},{-cyto-, veneneux}, {cellule, toxicite},{cellule, veneneux}

15 / 28

Page 76: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Translation with variation

Morphological lexiconI toxique → toxicite ’toxicity’

SynonymsI toxique → veneneux ’poisonous’

{-cyto-, toxic} → {-cyto-, toxicite},{-cyto-, veneneux}, {cellule, toxicite},{cellule, veneneux}

15 / 28

Page 77: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Translation with variation

Morphological lexiconI toxique → toxicite ’toxicity’

SynonymsI toxique → veneneux ’poisonous’

{-cyto-, toxic} → {-cyto-, toxicite},{-cyto-, veneneux}, {cellule, toxicite},{cellule, veneneux}

15 / 28

Page 78: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Translation with variation

Morphological lexiconI toxique → toxicite ’toxicity’

SynonymsI toxique → veneneux ’poisonous’

{-cyto-, toxic} → {-cyto-, toxicite},{-cyto-, veneneux}, {cellule, toxicite},{cellule, veneneux}

15 / 28

Page 79: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Recomposition - step 1

Permutate the target components :

{-cyto-, toxique} → {-cyto-, toxique},{toxique, -cyto-}

Recreate target words by generating all possibleconcatenations of the components :

{-cyto-, toxique} → {cyto toxique},{cytotoxique}

16 / 28

Page 80: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Recomposition - step 1

Permutate the target components :

{-cyto-, toxique} → {-cyto-, toxique},{toxique, -cyto-}

Recreate target words by generating all possibleconcatenations of the components :

{-cyto-, toxique} → {cyto toxique},{cytotoxique}

16 / 28

Page 81: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Recomposition - step 1

Permutate the target components :

{-cyto-, toxique} → {-cyto-, toxique},{toxique, -cyto-}

Recreate target words by generating all possibleconcatenations of the components :

{-cyto-, toxique} → {cyto toxique},{cytotoxique}

16 / 28

Page 82: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Recomposition - step 1

Permutate the target components :

{-cyto-, toxique} → {-cyto-, toxique},{toxique, -cyto-}

Recreate target words by generating all possibleconcatenations of the components :

{-cyto-, toxique} → {cyto toxique},{cytotoxique}

16 / 28

Page 83: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Recomposition - step 1

Permutate the target components :

{-cyto-, toxique} → {-cyto-, toxique},{toxique, -cyto-}

Recreate target words by generating all possibleconcatenations of the components :

{-cyto-, toxique} → {cyto toxique},{cytotoxique}

16 / 28

Page 84: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Recomposition - step 2

Filter out impossible target terms wordsI e.g.. : “cyto” is a bound morpheme, cannot occur as an

autonomous item

{cyto toxique}, {cytotoxique}→ {cytotoxique}

17 / 28

Page 85: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Recomposition - step 2

Filter out impossible target terms wordsI e.g.. : “cyto” is a bound morpheme, cannot occur as an

autonomous item

{cyto toxique}, {cytotoxique}→ {cytotoxique}

17 / 28

Page 86: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Recomposition - step 2

Filter out impossible target terms wordsI e.g.. : “cyto” is a bound morpheme, cannot occur as an

autonomous item

{cyto toxique}, {cytotoxique}→ {cytotoxique}

17 / 28

Page 87: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Selection

Match target term with the words of the target corpus

Allow at maximum 3 stop words between two words

{toxique cellule} → ‘‘toxique pour les

cellules’’ ’toxic to the cells’

18 / 28

Page 88: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Selection

Match target term with the words of the target corpus

Allow at maximum 3 stop words between two words

{toxique cellule} → ‘‘toxique pour les

cellules’’ ’toxic to the cells’

18 / 28

Page 89: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Selection

Match target term with the words of the target corpus

Allow at maximum 3 stop words between two words

{toxique cellule} → ‘‘toxique pour les

cellules’’ ’toxic to the cells’

18 / 28

Page 90: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DecompositionTranslationRecompositionSelection

Selection

Match target term with the words of the target corpus

Allow at maximum 3 stop words between two words

{toxique cellule} → ‘‘toxique pour les

cellules’’ ’toxic to the cells’

18 / 28

Page 91: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Outline

1 Context and original problem

2 Compositional translation framework

3 Detailed translation method

4 Experiments and results

5 Future work

Page 92: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Corpora

English, French, German

breast cancer

approx. 400k words per language12 scientic papers + 1

2 lay science

pos-tagged with software Xelda1

Comparability [Bo and Gaussier, 2010]:unrelated 0 ⇔ 1 perfectly comparable

I English-French: 0.71I English-German: 0.45

1http://www.temis.com

19 / 28

Page 93: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Corpora

English, French, German

breast cancer

approx. 400k words per language12 scientic papers + 1

2 lay science

pos-tagged with software Xelda1

Comparability [Bo and Gaussier, 2010]:unrelated 0 ⇔ 1 perfectly comparable

I English-French: 0.71I English-German: 0.45

1http://www.temis.com

19 / 28

Page 94: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Corpora

English, French, German

breast cancer

approx. 400k words per language12 scientic papers + 1

2 lay science

pos-tagged with software Xelda1

Comparability [Bo and Gaussier, 2010]:unrelated 0 ⇔ 1 perfectly comparable

I English-French: 0.71I English-German: 0.45

1http://www.temis.com

19 / 28

Page 95: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Corpora

English, French, German

breast cancer

approx. 400k words per language

12 scientic papers + 1

2 lay science

pos-tagged with software Xelda1

Comparability [Bo and Gaussier, 2010]:unrelated 0 ⇔ 1 perfectly comparable

I English-French: 0.71I English-German: 0.45

1http://www.temis.com

19 / 28

Page 96: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Corpora

English, French, German

breast cancer

approx. 400k words per language12 scientic papers + 1

2 lay science

pos-tagged with software Xelda1

Comparability [Bo and Gaussier, 2010]:unrelated 0 ⇔ 1 perfectly comparable

I English-French: 0.71I English-German: 0.45

1http://www.temis.com

19 / 28

Page 97: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Corpora

English, French, German

breast cancer

approx. 400k words per language12 scientic papers + 1

2 lay science

pos-tagged with software Xelda1

Comparability [Bo and Gaussier, 2010]:unrelated 0 ⇔ 1 perfectly comparable

I English-French: 0.71I English-German: 0.45

1http://www.temis.com19 / 28

Page 98: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Corpora

English, French, German

breast cancer

approx. 400k words per language12 scientic papers + 1

2 lay science

pos-tagged with software Xelda1

Comparability [Bo and Gaussier, 2010]:unrelated 0 ⇔ 1 perfectly comparable

I English-French: 0.71I English-German: 0.45

1http://www.temis.com19 / 28

Page 99: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Corpora

English, French, German

breast cancer

approx. 400k words per language12 scientic papers + 1

2 lay science

pos-tagged with software Xelda1

Comparability [Bo and Gaussier, 2010]:unrelated 0 ⇔ 1 perfectly comparable

I English-French: 0.71

I English-German: 0.45

1http://www.temis.com19 / 28

Page 100: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Corpora

English, French, German

breast cancer

approx. 400k words per language12 scientic papers + 1

2 lay science

pos-tagged with software Xelda1

Comparability [Bo and Gaussier, 2010]:unrelated 0 ⇔ 1 perfectly comparable

I English-French: 0.71I English-German: 0.45

1http://www.temis.com19 / 28

Page 101: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Source terms

Morphologically constructed word collected from the Englishtexts

None of them have a translation in the general languagedictionary which is attested in the target texts

I English to French: 1839 source termsI English to German: 1824 source terms

20 / 28

Page 102: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Source terms

Morphologically constructed word collected from the Englishtexts

None of them have a translation in the general languagedictionary which is attested in the target texts

I English to French: 1839 source termsI English to German: 1824 source terms

20 / 28

Page 103: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Source terms

Morphologically constructed word collected from the Englishtexts

None of them have a translation in the general languagedictionary which is attested in the target texts

I English to French: 1839 source termsI English to German: 1824 source terms

20 / 28

Page 104: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Resources for translation

General language dictionary (Xelda)

Domain-specific dictionary : cognates extracted from corpus[Hauer and Kondrak, 2011]

Morpheme translation table (hand-crafted)

Synonyms (Xelda)

Morphological families [Porter, 1980]

21 / 28

Page 105: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Resources for translation

General language dictionary (Xelda)

Domain-specific dictionary : cognates extracted from corpus[Hauer and Kondrak, 2011]

Morpheme translation table (hand-crafted)

Synonyms (Xelda)

Morphological families [Porter, 1980]

21 / 28

Page 106: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Resources for translation

General language dictionary (Xelda)

Domain-specific dictionary : cognates extracted from corpus[Hauer and Kondrak, 2011]

Morpheme translation table (hand-crafted)

Synonyms (Xelda)

Morphological families [Porter, 1980]

21 / 28

Page 107: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Resources for translation

General language dictionary (Xelda)

Domain-specific dictionary : cognates extracted from corpus[Hauer and Kondrak, 2011]

Morpheme translation table (hand-crafted)

Synonyms (Xelda)

Morphological families [Porter, 1980]

21 / 28

Page 108: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Resources for translation

General language dictionary (Xelda)

Domain-specific dictionary : cognates extracted from corpus[Hauer and Kondrak, 2011]

Morpheme translation table (hand-crafted)

Synonyms (Xelda)

Morphological families [Porter, 1980]

21 / 28

Page 109: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Resources for translation

General language dictionary (Xelda)

Domain-specific dictionary : cognates extracted from corpus[Hauer and Kondrak, 2011]

Morpheme translation table (hand-crafted)

Synonyms (Xelda)

Morphological families [Porter, 1980]

21 / 28

Page 110: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Evaluation measures I

Coverage

C =

∑|ST |i=1 σ(STi )

|ST |

σ(STi ) =

{1 if |Trans(STi )| ≥ 10 else

⇒ % of source terms with at least 1 translation (regardless of itsaccuracy)

22 / 28

Page 111: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Evaluation measures II

Precision

P =|Exact||Trans|

⇒ % of generated translations which are exact translations

Overall quality

OQ = C × P

⇒ trade-off between precision and coverage

23 / 28

Page 112: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Experiments

combination of linguistic resources

quality of the lexicon with and without the fertile translations

24 / 28

Page 113: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Experiments

combination of linguistic resources

quality of the lexicon with and without the fertile translations

24 / 28

Page 114: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Experiments

combination of linguistic resources

quality of the lexicon with and without the fertile translations

24 / 28

Page 115: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Results: English → French

C P OQ

-f +f -f +f -f +f

Gen.+Morph. .04 .12 .81 .57 .03 .07Gen.+Morph. +S .05 .15 .69 .50 .03 .08Gen.+Morph. +M .11 .23 .20 .28 .02 .06Gen.+Morph. +D .16 .26 .70 .60 .11 .16Gen.+Morph. +SMD .24 .39 .31 .33 .07 .13

avg. gain +11 -8.6 +4.8

25 / 28

Page 116: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Results: English → German

C P OQ

-f +f -f +f -f +f

Gen.+Morph. .06 .13 .80 .35 .05 .05Gen.+Morph. +S .08 .16 .69 .31 .05 .05Gen.+Morph. +M .12 .22 .40 .23 .05 .05Gen.+Morph. +D .17 .26 .65 .39 .11 .10Gen.+Morph. +SMD .24 .36 .43 .27 .10 .10

avg. gain +9.2 -28.4 -0.2

26 / 28

Page 117: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Discussion: English-French vs. English-German results

English-German corpus is much less comparable (0.45 vs.0.71)

Morphological types:

German germanic language: tendency to agglutinationoestrogen-independant → Ostrogen-unabhangige

French romance language: creates phrases more easilyoestrogen-independant → independant des œstrogenes

27 / 28

Page 118: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Discussion: English-French vs. English-German results

English-German corpus is much less comparable (0.45 vs.0.71)

Morphological types:

German germanic language: tendency to agglutinationoestrogen-independant → Ostrogen-unabhangige

French romance language: creates phrases more easilyoestrogen-independant → independant des œstrogenes

27 / 28

Page 119: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Discussion: English-French vs. English-German results

English-German corpus is much less comparable (0.45 vs.0.71)

Morphological types:

German germanic language: tendency to agglutinationoestrogen-independant → Ostrogen-unabhangige

French romance language: creates phrases more easilyoestrogen-independant → independant des œstrogenes

27 / 28

Page 120: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Discussion: English-French vs. English-German results

English-German corpus is much less comparable (0.45 vs.0.71)

Morphological types:

German germanic language: tendency to agglutinationoestrogen-independant → Ostrogen-unabhangige

French romance language: creates phrases more easilyoestrogen-independant → independant des œstrogenes

27 / 28

Page 121: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Discussion: English-French vs. English-German results

English-German corpus is much less comparable (0.45 vs.0.71)

Morphological types:

German germanic language: tendency to agglutinationoestrogen-independant → Ostrogen-unabhangige

French romance language: creates phrases more easilyoestrogen-independant → independant des œstrogenes

27 / 28

Page 122: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Error analysis

Problems in word reordering

I self-examination → untersuchung selbst ’examination self’

Wrong or innapropriate translations

I in-patient → pas malade ’not ill’

in → “inside” → inside patientin → “inverse” → not a patient

28 / 28

Page 123: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Error analysis

Problems in word reordering

I self-examination → untersuchung selbst ’examination self’

Wrong or innapropriate translations

I in-patient → pas malade ’not ill’

in → “inside” → inside patientin → “inverse” → not a patient

28 / 28

Page 124: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Error analysis

Problems in word reorderingI self-examination → untersuchung selbst ’examination self’

Wrong or innapropriate translations

I in-patient → pas malade ’not ill’

in → “inside” → inside patientin → “inverse” → not a patient

28 / 28

Page 125: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Error analysis

Problems in word reorderingI self-examination → untersuchung selbst ’examination self’

Wrong or innapropriate translations

I in-patient → pas malade ’not ill’

in → “inside” → inside patientin → “inverse” → not a patient

28 / 28

Page 126: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

DataEvaluation measuresResults

Error analysis

Problems in word reorderingI self-examination → untersuchung selbst ’examination self’

Wrong or innapropriate translationsI in-patient → pas malade ’not ill’

in → “inside” → inside patientin → “inverse” → not a patient

28 / 28

Page 127: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Outline

1 Context and original problem

2 Compositional translation framework

3 Detailed translation method

4 Experiments and results

5 Future work

Page 128: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Future work

Improve quality of linguistic resources

I morphological derivation rules instead of stemmingI use of a thesaurus

Try translations patterns instead of permutations

Rank translations

29 / 28

Page 129: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Future work

Improve quality of linguistic resources

I morphological derivation rules instead of stemmingI use of a thesaurus

Try translations patterns instead of permutations

Rank translations

29 / 28

Page 130: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Future work

Improve quality of linguistic resourcesI morphological derivation rules instead of stemming

I use of a thesaurus

Try translations patterns instead of permutations

Rank translations

29 / 28

Page 131: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Future work

Improve quality of linguistic resourcesI morphological derivation rules instead of stemmingI use of a thesaurus

Try translations patterns instead of permutations

Rank translations

29 / 28

Page 132: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Future work

Improve quality of linguistic resourcesI morphological derivation rules instead of stemmingI use of a thesaurus

Try translations patterns instead of permutations

Rank translations

29 / 28

Page 133: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Context and original problemCompositional translation framework

Detailed translation methodExperiments and results

Future work

Future work

Improve quality of linguistic resourcesI morphological derivation rules instead of stemmingI use of a thesaurus

Try translations patterns instead of permutations

Rank translations

29 / 28

Page 134: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Thank you for your attention.

[email protected]@univ-nantes.fr

[email protected]@lingua-et-machina.com

Page 135: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

ADDITIONAL SLIDES

Page 136: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Exact translations

Non fertiles:I pathophysiological → physiopathologiqueI overactive → uberaktiv

Fertiles:I cardiotoxicity → toxicite cardiaque ’cardiac toxicity’I mastectomy → ablation der brust ’ablation of the breast’

Page 137: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Morphological variants

Non fertiles:I dosimetry → dosimetrique ’dosimetric’I radiosensitivity → strahlenempfindlich ’radiosensitive’

Fertiles:I milk-producing → production de lait ’production of milk’I selfexamination → selbst untersuchen ’self examine’

Page 138: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Inexact but semantically related

Non fertiles:I oncogene → oncogenese ’oncogenesis’I breakthrough → durchbrechen ’break’

Fertiles:I chemoradiotherapy → chemotherapie oder strahlen

’chemotherapy or radiation’I treatable → pouvoir le traiter ’can treat it’

Page 139: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

Wrong translations

Non fertiles:I immunoscore → immunomarquer ’immunostain’I check-in → unkontrollieren ’uncontrolled’

Fertiles:I bloodstream → fliessen mehr blut ’more blood flow’I risk-reducing → risque de reduire ’risk of reducing’

Page 140: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

References I

Bo, L. and Gaussier, E. (2010).

Improving corpus comparability for bilingual lexicon extraction from comparable corpora.In 23eme International Conference on Computational Linguistics, pages 23–27, Beijing, Chine.

Cartoni, B. (2009).

Lexical morphology in machine translation: A feasibility study.In Proceedings of the 12th Conference of the European Chapter of the ACL, pages 130–138, Athens, Greece.

Harastani, R., Daille, B., and Morin, E. (2012).

Neoclassical compound alignments from comparable corpora.In Proceedings of the 13th International Conference on Computational Linguistics and Intelligent TextProcessing, volume 2, pages 72–82, New Delhi, India.

Hauer, B. and Kondrak, G. (2011).

Clustering semantically equivalent words into cognate sets in multilingual lists.In Proceedings of the 5th International Joint Conference on Natural Language Processing, pages 865–873,Chiang Mai, Thailand.

Keenan, E. L. and Faltz, L. M. (1985).

Boolean semantics for natural language.D. Reidel, Dordrecht, Holland.

Morin, E. and Daille, B. (2009).

Compositionality and lexical alignment of multi-word terms.In Language Resources and Evaluation (LRE), volume 44 of Multiword expression: hard going or plainsailing, pages 79–95. P. Rayson, S. Piao, S. Sharoff, S. Evert, B. Villada Moiron, springer netherlandsedition.

Namer, F. and Baud, R. (2007).

Defining and relating biomedical terms: Towards a cross-language morphosemantics-based system.International Journal of Medical Informatics, 76(2-3):226–33.

Page 141: Identification of Fertile Translations in Comparable Corpora: a Morpho-Compositional Approach

References II

Porter, M. F. (1980).

An algorithm for suffix stripping.Program, 14(3):130–137.

Weller, M., Gojun, A., Heid, U., Daille, B., and Harastani, R. (2011).

Simple methods for dealing with term variation and term alignment.In Proceedings of the 9th International Conference on Terminology and Artificial Intelligence, pages 87–93,Paris, France.