morpho challenge competition 2005-2010 evaluations and results authors mikko kurimo

Morpho Challenge competition 2005-2010

Evaluations and results

Authors Mikko KurimoSami Virpioja

Ville Turunen Krista Lagus

Introduction

• Started in 2005.

• Open to all.

• Organizers selected evaluation tasks, data and metric and performed all the evaluations.

• Unsupervised and semi-supervised approach. • Semi-supervised approach was introduced in Morpho Challenge

2010.

Aim

• To develop Language – independent algorithms to discover morphemes from text material .

• Morphemes : It is the smallest grammatical unit in a language.

• To promote research in machine learning , NLP .

Evaluation tasks & languages

# From Mikko Kurimo, Sami Virpioja, Ville Turunen, Krista Lagus. 2010. Morpho Challenge 2005-2010: Evaluations and Results.

Year Languages added Tasks added

2005 English , Turkish , Finnish Word Segmentation Speech Recognition

2007 German Information retrieval (IR)

2008 Arabic Context IR

2009 - Machine Translation

2010 - Semi-supervised approach

Word Segmentation

• In 2005 : • Segment the text into morphemes .

• In 2007 :• Locate the surface form (word segmentation).• Locate which surface form are the allomorph of the same

underlying morpheme.

Principles for segmentation

1. The evaluation is based on a subset of the word forms given as training data.

2. The frequency of the word form plays no role in evaluation.

3. The evaluation score is balanced F-measure, the harmonic mean of precision and recall.

4. If the linguistic gold standard has several alternative analysis for one word, for full precision, it is enough that one of the alternatives is equivalent to the proposed analysis

Information retrieval

• The algorithms were tested by using the morpheme segmentations for text retrieval.

• A stemming algorithm is used to reduce inflected words to base words.

• Problem : Language specific.

• Challenges • Correct weighting method.• Number of queries were limited.

Machine translation

• Two stages • Alignment of parallel sentences in both languages.• Training a language model.

• In 2009 Morph challenge the focus was on alignment problem.

Some Algorithms

• Bernhard (Bernhard, 2006) : • Best for Finnish , English and German linguistic evaluation.

• First list of prefixes and suffixes is extracted.

• Segmentations are generated using this list.

• Best segmentation is selected on the basis of cost function.

Some Algorithms

• Morfessor algorithm :• To discover most basic & compact description of data.

• Substrings occurring frequently in the training set are also

considered as morphemes.

• Ex. hand, hand+s , hand+ful , left+hand+ed.

• Gives better result than other algorithms in Finnish & Turkish.

# From : Morfessor in the morpho challenge (2006) by Mathias Creutz , Krista Lagus

Result Morpho Challenge : 2010Language Method Precision Recall F –

measureType

English Morfessor S+W

65.62% 69.28% 67.40% S

Finnish DEAP MDL-

NOCAT

56.03% 70.71% 62.52% S

German Morfessor U+W

58.55% 44.94% 50.85% P

Turkish Morfessor S+W+L

71.69% 59.97% 65.31% S

• S = semi-supervised algorithm• P = unsupervised algorithm with supervised parameter tuning

# From http://research.ics.aalto.fi/events/morphochallenge2010

http://research.ics.aalto.fi/events/morphochallenge2010/

Open Challenges

• What is the best analysis algorithm ?

• What is the meaning of the morphemes ?

• How to evaluate the alternative analyses ?

• How to improve the analysis using context ?

• How to effectively apply semi-supervised learning ?

References

• Mikko Kurimo, Sami Virpioja, Ville Turunen, Krista Lagus. 2010. Morpho Challenge 2005-2010: Evaluations and Results. Proceedings of the 11th meeting of the ACL special interest group on Computational Morphology and Phonology . • Mathias Creutz and Krista Lagus. 2006 . Morfessor in the Morpho Challenge. Proceedings of the PASCAL Challenge Workshop on Unsupervised Segmentation of Words into Morphemes

• Official site of Morpho Challenge : http://research.ics.aalto.fi/events/morphochallenge2010/

• Wikipedia : http://en.wikipedia.org/

http://research.ics.aalto.fi/events/morphochallenge2010/

http://en.wikipedia.org/

http://en.wikipedia.org/

Thank You

morpho challenge competition 2005-2010 evaluations and results authors mikko kurimo

Documents