morpho challenge competition 2005-2010 evaluations and results authors mikko kurimo

14
Morpho Challenge competition 2005-2010 Evaluations and results Authors Mikko Kurimo Sami Virpioja Ville Turunen

Upload: mara-blankenship

Post on 31-Dec-2015

13 views

Category:

Documents


0 download

DESCRIPTION

Morpho Challenge competition 2005-2010 Evaluations and results Authors Mikko Kurimo Sami Virpioja Ville Turunen Krista Lagus. Introduction. Started in 2005. Open to all. Organizers selected evaluation tasks, data and metric and performed all the evaluations . - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Morpho  Challenge competition  2005-2010 Evaluations and results Authors Mikko Kurimo

Morpho Challenge competition 2005-2010

Evaluations and results

Authors Mikko KurimoSami Virpioja

Ville Turunen Krista Lagus

Page 2: Morpho  Challenge competition  2005-2010 Evaluations and results Authors Mikko Kurimo

Introduction

• Started in 2005.

• Open to all.

• Organizers selected evaluation tasks, data and metric and performed all the evaluations.

• Unsupervised and semi-supervised approach. • Semi-supervised approach was introduced in Morpho Challenge

2010.

Page 3: Morpho  Challenge competition  2005-2010 Evaluations and results Authors Mikko Kurimo

Aim

• To develop Language – independent algorithms to discover morphemes from text material .

• Morphemes : It is the smallest grammatical unit in a language.

• To promote research in machine learning , NLP .

Page 4: Morpho  Challenge competition  2005-2010 Evaluations and results Authors Mikko Kurimo

Evaluation tasks & languages

# From Mikko Kurimo, Sami Virpioja, Ville Turunen, Krista Lagus. 2010. Morpho Challenge 2005-2010: Evaluations and Results.

Year Languages added Tasks added

2005 English , Turkish , Finnish Word Segmentation Speech Recognition

2007 German Information retrieval (IR)

2008 Arabic Context IR

2009 - Machine Translation

2010 - Semi-supervised approach

Page 5: Morpho  Challenge competition  2005-2010 Evaluations and results Authors Mikko Kurimo

Word Segmentation

• In 2005 : • Segment the text into morphemes .

• In 2007 :• Locate the surface form (word segmentation).• Locate which surface form are the allomorph of the same

underlying morpheme.

Page 6: Morpho  Challenge competition  2005-2010 Evaluations and results Authors Mikko Kurimo

Principles for segmentation

1. The evaluation is based on a subset of the word forms given as training data.

2. The frequency of the word form plays no role in evaluation.

3. The evaluation score is balanced F-measure, the harmonic mean of precision and recall.

4. If the linguistic gold standard has several alternative analysis for one word, for full precision, it is enough that one of the alternatives is equivalent to the proposed analysis

Page 7: Morpho  Challenge competition  2005-2010 Evaluations and results Authors Mikko Kurimo

Information retrieval

• The algorithms were tested by using the morpheme segmentations for text retrieval.

• A stemming algorithm is used to reduce inflected words to base words.

• Problem : Language specific.

• Challenges • Correct weighting method.• Number of queries were limited.

Page 8: Morpho  Challenge competition  2005-2010 Evaluations and results Authors Mikko Kurimo

Machine translation

• Two stages • Alignment of parallel sentences in both languages.• Training a language model.

• In 2009 Morph challenge the focus was on alignment problem.

Page 9: Morpho  Challenge competition  2005-2010 Evaluations and results Authors Mikko Kurimo

Some Algorithms

• Bernhard (Bernhard, 2006) : • Best for Finnish , English and German linguistic evaluation.

• First list of prefixes and suffixes is extracted.

• Segmentations are generated using this list.

• Best segmentation is selected on the basis of cost function.

Page 10: Morpho  Challenge competition  2005-2010 Evaluations and results Authors Mikko Kurimo

Some Algorithms

• Morfessor algorithm :• To discover most basic & compact description of data.

• Substrings occurring frequently in the training set are also

considered as morphemes.

• Ex. hand, hand+s , hand+ful , left+hand+ed.

• Gives better result than other algorithms in Finnish & Turkish.

# From : Morfessor in the morpho challenge (2006) by Mathias Creutz , Krista Lagus

Page 11: Morpho  Challenge competition  2005-2010 Evaluations and results Authors Mikko Kurimo

Result Morpho Challenge : 2010Language Method Precision Recall F –

measureType

English Morfessor S+W

65.62% 69.28% 67.40% S

Finnish DEAP MDL-

NOCAT

56.03% 70.71% 62.52% S

German Morfessor U+W

58.55% 44.94% 50.85% P

Turkish Morfessor S+W+L

71.69% 59.97% 65.31% S

• S = semi-supervised algorithm• P = unsupervised algorithm with supervised parameter tuning

# From http://research.ics.aalto.fi/events/morphochallenge2010

Page 12: Morpho  Challenge competition  2005-2010 Evaluations and results Authors Mikko Kurimo

Open Challenges

• What is the best analysis algorithm ?

• What is the meaning of the morphemes ?

• How to evaluate the alternative analyses ?

• How to improve the analysis using context ?

• How to effectively apply semi-supervised learning ?

Page 13: Morpho  Challenge competition  2005-2010 Evaluations and results Authors Mikko Kurimo

References

• Mikko Kurimo, Sami Virpioja, Ville Turunen, Krista Lagus. 2010. Morpho Challenge 2005-2010: Evaluations and Results. Proceedings of the 11th meeting of the ACL special interest group on Computational Morphology and Phonology . • Mathias Creutz and Krista Lagus. 2006 . Morfessor in the Morpho Challenge. Proceedings of the PASCAL Challenge Workshop on Unsupervised Segmentation of Words into Morphemes

• Official site of Morpho Challenge : http://research.ics.aalto.fi/events/morphochallenge2010/

• Wikipedia : http://en.wikipedia.org/

Page 14: Morpho  Challenge competition  2005-2010 Evaluations and results Authors Mikko Kurimo

Thank You