a study of association measures and their combination for arabic mwt extraction

32
Introduction The state of MWT extraction Proposed Method Evaluation and Results Conclusion and perspectives Bibliography . . A Study of Association Measures and their Combination for Arabic MWT Extraction 10th International Conference on Terminology and Artificial Intelligence (TIA’2013) Abdelkader El Mahdaouy, Said El Alaoui Ouatik and Eric Gaussier October 28th, 2013 A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 1 / 20

Upload: abdelkader-mahdaouy

Post on 08-May-2015

175 views

Category:

Technology


2 download

DESCRIPTION

Automatic Multi-Word Term (MWT) extraction is a very important issue to many applications, such as information retrieval, question answering, and text categorization. Although many methods have been used for MWT extraction in English and other European languages, few studies have been applied to Arabic. In this paper, we propose a novel, hybrid method which combines linguistic and statistical approaches for Arabic Multi-Word Term extraction. The main contribution of our method is to consider contextual information and both termhood and unithood for association measures at the statistical filtering step. In addition, our technique takes into account the problem of MWT variation in the linguistic filtering step. The performance of the proposed statistical measure (NLC-value) is evaluated using an Arabic environment corpus by comparing it with some existing competitors. Experimental results show that our NLC-value measure outperforms the other ones in term of precision for both bi-grams and tri-grams.

TRANSCRIPT

Page 1: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

.

......

A Study of Association Measures and theirCombination for Arabic MWT Extraction

10th International Conference on Terminology and ArtificialIntelligence (TIA’2013)

Abdelkader El Mahdaouy, Said El Alaoui Ouatik and Eric Gaussier

October 28th, 2013

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 1 / 20

Page 2: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

.. Table of contents...1 Introduction

Terminology ExtractionMotivation

...2 The state of MWT extractionStandard ApproachesStatistical Measures

...3 Proposed MethodLinguistic FilterStatistical Filter

...4 Evaluation and ResultsCorpusEvaluation MethodObtained results

...5 Conclusion and perspectives

...6 Bibliography

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 2 / 20

Page 3: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

Terminology ExtractionMotivation

.. Terminology Extraction.Terminology..

......Set of terms representing the system of concepts of a particularsubject field..Term..

......

lexical unit that has an unambiguous meaning when used in atext of a specific domain.Refer to a defined concept ... (ISO 704).

.Terminology Extraction..

......

Subtask of information extraction.Automatically extract relevant terms from a given corpus.

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 3 / 20

Page 4: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

Terminology ExtractionMotivation

.. Terminology Extraction.Terminology..

......Set of terms representing the system of concepts of a particularsubject field..Term..

......

lexical unit that has an unambiguous meaning when used in atext of a specific domain.Refer to a defined concept ... (ISO 704).

.Terminology Extraction..

......

Subtask of information extraction.Automatically extract relevant terms from a given corpus.

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 3 / 20

Page 5: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

Terminology ExtractionMotivation

.. Motivation

The bag-of-words model (based on single word terms) is asimplifying representation used in natural language processingand information retrieval(IR).Multi-word terms (MWT) are less ambiguous and lesspolysemous than single word terms.Using MWT instead of single word terms yields a betterrepresentation of document content.

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 4 / 20

Page 6: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

Standard ApproachesStatistical Measures

.. Standard Approaches

.Linguistic Approaches

..

......

Based on linguistic pre-processing and POS tagging.Extract candidate terms candidate using syntactic patterns.

.Statistical Approaches

..

......

Ranking candidate terms based on a particular measure that gives higher scoresto ”good” candidate terms.Frequent expressions are assumed to represent important concepts.

.Hybrid Approaches

..

......Combine linguistic and statistical techniques to extract MWTs in order to avoid theweaknesses of the two approaches.

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 5 / 20

Page 7: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

Standard ApproachesStatistical Measures

.. Standard Approaches

.Linguistic Approaches

..

......

Based on linguistic pre-processing and POS tagging.Extract candidate terms candidate using syntactic patterns.

.Statistical Approaches

..

......

Ranking candidate terms based on a particular measure that gives higher scoresto ”good” candidate terms.Frequent expressions are assumed to represent important concepts.

.Hybrid Approaches

..

......Combine linguistic and statistical techniques to extract MWTs in order to avoid theweaknesses of the two approaches.

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 5 / 20

Page 8: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

Standard ApproachesStatistical Measures

.. Standard Approaches

.Linguistic Approaches

..

......

Based on linguistic pre-processing and POS tagging.Extract candidate terms candidate using syntactic patterns.

.Statistical Approaches

..

......

Ranking candidate terms based on a particular measure that gives higher scoresto ”good” candidate terms.Frequent expressions are assumed to represent important concepts.

.Hybrid Approaches

..

......Combine linguistic and statistical techniques to extract MWTs in order to avoid theweaknesses of the two approaches.

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 5 / 20

Page 9: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

Standard ApproachesStatistical Measures

.. Characteristics of MWTs

Defined by Kageura et al., 1996 :.Unithood..

......

The degree of strength or stability of syntagmaticcombinations or collocations.Log-Likelihood Ratio, T-Score, MI, etc.

.Termthood..

......

The degree to which a linguistic unit is related to a specificdomain concept.C/NC-value.

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 6 / 20

Page 10: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

Standard ApproachesStatistical Measures

.. Characteristics of MWTs

Defined by Kageura et al., 1996 :.Unithood..

......

The degree of strength or stability of syntagmaticcombinations or collocations.Log-Likelihood Ratio, T-Score, MI, etc.

.Termthood..

......

The degree to which a linguistic unit is related to a specificdomain concept.C/NC-value.

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 6 / 20

Page 11: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

Linguistic FilterStatistical Filter

.. Proposed Method

Hybrid method consists of two filters:.Linguistic Filter..

......

Use AMIRA 2.0 (POS tagging toolkit).Extract MWT candidates based on syntactic patterns.Handle the problem of MWT variation.

.Statistical Filter..

......

Propose novel statistical measure (NLC-value) that combinecontext information with termhood and unithood.Evaluate state-of-the-art statistical measures.

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 7 / 20

Page 12: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

Linguistic FilterStatistical Filter

.. Proposed Method

Hybrid method consists of two filters:.Linguistic Filter..

......

Use AMIRA 2.0 (POS tagging toolkit).Extract MWT candidates based on syntactic patterns.Handle the problem of MWT variation.

.Statistical Filter..

......

Propose novel statistical measure (NLC-value) that combinecontext information with termhood and unithood.Evaluate state-of-the-art statistical measures.

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 7 / 20

Page 13: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

Linguistic FilterStatistical Filter

.. Linguistic Filter

The proposed linguistic filter extractscandidate MWTs based on two corecomponents; the POS tagger and thesequence identifier:.Syntactic patterns..

......

(Noun + (Noun|Adj) +|(Noun|adj) + |(Noun|Adj)).Noun Prep Noun.

Figure 1 : The global schema ofthe linguistic filter

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 8 / 20

Page 14: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

Linguistic FilterStatistical Filter

.. Term variation

Four types of variations are handled : graphical variants,inflectional variants, morpho-syntactic variants and syntacticvariants..Graphical Variants..

......

Concern orthographic errors occurred in writing some particularletters (”ø ”, ” �è” and ”

@”).

.Example..

......úk. ñËñJJ. Ë @ ¨ñ

J�JË @ which leads to úk. ñËñJJ. Ë @ ¨ñJ�JË @ meaning

“Biodiversity”.

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 9 / 20

Page 15: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

Linguistic FilterStatistical Filter

.. Term variation

Four types of variations are handled : graphical variants,inflectional variants, morpho-syntactic variants and syntacticvariants..Graphical Variants..

......

Concern orthographic errors occurred in writing some particularletters (”ø ”, ” �è” and ”

@”).

.Example..

......úk. ñËñJJ. Ë @ ¨ñ

J�JË @ which leads to úk. ñËñJJ. Ë @ ¨ñJ�JË @ meaning

“Biodiversity”.

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 9 / 20

Page 16: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

Linguistic FilterStatistical Filter

.. Term variation.Inflectional Variants..

......

These variants due to the use of different forms for the wordsconstituting a MWT:

The gender and the number.The presence/absence of a definite article.

.Examples..

......

...1 ¡JjÖÏ @ �HñÊ�K (ocean pollution) which leads to �HA¢JjÖÏ @ �HñÊ�K(pollution of the oceans).

...2 èAJÖÏ @ �HñÊ�K (water pollution) which leads to èAJÓ �HñÊ�K (thewater pollution).

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 10 / 20

Page 17: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

Linguistic FilterStatistical Filter

.. Term variation.Inflectional Variants..

......

These variants due to the use of different forms for the wordsconstituting a MWT:

The gender and the number.The presence/absence of a definite article.

.Examples..

......

...1 ¡JjÖÏ @ �HñÊ�K (ocean pollution) which leads to �HA¢JjÖÏ @ �HñÊ�K(pollution of the oceans).

...2 èAJÖÏ @ �HñÊ�K (water pollution) which leads to èAJÓ �HñÊ�K (thewater pollution).

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 10 / 20

Page 18: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

Linguistic FilterStatistical Filter

.. Term variation

.Morpho-syntactic Variants..

......

These variants affect the internal structure of term as the words itcontains are related through derivational morphology:

Noun1Noun2 ⇔ Noun1Adj.Noun1Adj ⇔ Noun1Prep Noun.

.Examples..

......

...1 Z @ñêË @ �HñÊ�K and ø Z@ñêË@�HñÊ�K (air pollution).

...2 ù¢® K ÉJÓQK. which leads to ¡ ® JË @ áÓ ÉJÓQK. (barrel of oil).

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 11 / 20

Page 19: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

Linguistic FilterStatistical Filter

.. Term variation

.Morpho-syntactic Variants..

......

These variants affect the internal structure of term as the words itcontains are related through derivational morphology:

Noun1Noun2 ⇔ Noun1Adj.Noun1Adj ⇔ Noun1Prep Noun.

.Examples..

......

...1 Z @ñêË @ �HñÊ�K and ø Z@ñêË@�HñÊ�K (air pollution).

...2 ù¢® K ÉJÓQK. which leads to ¡ ® JË @ áÓ ÉJÓQK. (barrel of oil).

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 11 / 20

Page 20: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

Linguistic FilterStatistical Filter

.. Term variation.Syntactic Variants..

......

These variants modify the structure of the MWT candidate byadding one or more words (as adjectives) but do not affect thegrammatical categories:

Noun1Noun2 ⇔ Noun1Noun2Adj.Noun1Adj1 ⇔ Noun1Adj1Adj2.

.Examples..

......

...1 èAJÖÏ @ �HA Kð Qm× (Water stocks) and �éJ ̄ñm.Ì'@ è AJÖÏ @ �HA Kð Qm×(Groundwater stocks).

...2 �éj�Ë@ �éÒ ¢JÓ (Health Organization) and �éJÖÏ AªË @ �éj�Ë@ �éÒ ¢JÓ(World Health Organization).

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 12 / 20

Page 21: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

Linguistic FilterStatistical Filter

.. Term variation.Syntactic Variants..

......

These variants modify the structure of the MWT candidate byadding one or more words (as adjectives) but do not affect thegrammatical categories:

Noun1Noun2 ⇔ Noun1Noun2Adj.Noun1Adj1 ⇔ Noun1Adj1Adj2.

.Examples..

......

...1 èAJÖÏ @ �HA Kð Qm× (Water stocks) and �éJ ̄ñm.Ì'@ è AJÖÏ @ �HA Kð Qm×(Groundwater stocks).

...2 �éj�Ë@ �éÒ ¢JÓ (Health Organization) and �éJÖÏ AªË @ �éj�Ë@ �éÒ ¢JÓ(World Health Organization).

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 12 / 20

Page 22: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

Linguistic FilterStatistical Filter

..Statistical FilterThe NLC-value

.NLC-value

..

...... NLC-value(a) = 0.8 · LC-value(a) + 0.2 · N-value(a) (1)

with

LC-value(a) =

{log2(|a|) · FL(a) if a is not nested,log2(|a|) · (FL(a)− 1

|Ta|∑

b∈Ta FL(b)) else ,

and FL(a) = f(a) · ln(2 + min(LLR(a))),N − value (a) =

∑b∈Ca

fa(b) ·|T(b)|

n...1 |a| denotes the length in words of candidate term a....2 f(a) is the number of occurrences of a....3 T(a) denotes the set of longer candidate terms into which a appears....4 |T(a)| is the cardinality of the set T(a)....5 Ca denotes the set of distinct context words of a....6 fa(b) corresponds to the number of times b occurs in the context of a....7 n is the total number of terms considered.

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 13 / 20

Page 23: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

CorpusEvaluation MethodObtained results

.. The Corpus

Lack of Arabic specialized domain corpora.The corpus built contains 1666 files comprising 53569different tokens (without stop words) extracted from the Website “Al-Khat Alakhdar”.The corpus covers various environmental topics such aspollution, water purification, soil degradation, forestpreservation, climate change and natural disasters.

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 14 / 20

Page 24: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

CorpusEvaluation MethodObtained results

.. The Evaluation

...1 We computed the association scores (LLR, C-value, NC-value,NTC-value, LLR+C-value, NLC-value) for the MWTcandidates.

...2 We retain from each produced ranking for each statisticalmeasure the k-best candidates, with k ranging from 100 to300 at intervals of 100.

...3 We have constituted automatically a reference list of all ArabicMWTs available in the latest version of AGROVOC thesaurus.

...4 We used translation of MWT and European terminologicaldatabase IATE.

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 15 / 20

Page 25: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

CorpusEvaluation MethodObtained results

.. Obtained resultsTop MWT considred

Statistical measures 100 200 300LLR 75,0% 70,5% 64,3%C-value 71,0% 69,0% 67,3%NC-value 74,0% 70,0% 68,3%NTC-value 80,0% 71,5% 69,7%LLR+C-value 73,0% 72,0% 68,3%NLC-Value 82,0% 75,5% 73,0%

Table 1 : Results obtained for different statistical measures

Top MWT considredStatistical measures 100 200 300LLR 35 60 80C-value 27 59 82NC-value 32 62 82NTC-value 35 60 83LLR+C-value 34 60 84NLC-Value 41 65 86

Table 2 : Number of terms found in agrovocforeach measure

Top MWT considredStatistical measures 100 200 300LLR 40 81 113C-value 44 79 120NC-value 42 78 123NTC-value 45 83 126LLR+C-value 39 84 121NLC-Value 41 86 133

Table 3 : Number of terms found in IATEforeach measure

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 16 / 20

Page 26: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

CorpusEvaluation MethodObtained results

.. Obtained resultsTop MWT considred

Statistical measures 100 200 300LLR 75,0% 70,5% 64,3%C-value 71,0% 69,0% 67,3%NC-value 74,0% 70,0% 68,3%NTC-value 80,0% 71,5% 69,7%LLR+C-value 73,0% 72,0% 68,3%NLC-Value 82,0% 75,5% 73,0%

Table 1 : Results obtained for different statistical measures

Top MWT considredStatistical measures 100 200 300LLR 35 60 80C-value 27 59 82NC-value 32 62 82NTC-value 35 60 83LLR+C-value 34 60 84NLC-Value 41 65 86

Table 2 : Number of terms found in agrovocforeach measure

Top MWT considredStatistical measures 100 200 300LLR 40 81 113C-value 44 79 120NC-value 42 78 123NTC-value 45 83 126LLR+C-value 39 84 121NLC-Value 41 86 133

Table 3 : Number of terms found in IATEforeach measure

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 16 / 20

Page 27: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

CorpusEvaluation MethodObtained results

.. Obtained resultsTop MWT considred

Statistical measures 100 200 300LLR 75,0% 70,5% 64,3%C-value 71,0% 69,0% 67,3%NC-value 74,0% 70,0% 68,3%NTC-value 80,0% 71,5% 69,7%LLR+C-value 73,0% 72,0% 68,3%NLC-Value 82,0% 75,5% 73,0%

Table 1 : Results obtained for different statistical measures

Top MWT considredStatistical measures 100 200 300LLR 35 60 80C-value 27 59 82NC-value 32 62 82NTC-value 35 60 83LLR+C-value 34 60 84NLC-Value 41 65 86

Table 2 : Number of terms found in agrovocforeach measure

Top MWT considredStatistical measures 100 200 300LLR 40 81 113C-value 44 79 120NC-value 42 78 123NTC-value 45 83 126LLR+C-value 39 84 121NLC-Value 41 86 133

Table 3 : Number of terms found in IATEforeach measure

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 16 / 20

Page 28: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

CorpusEvaluation MethodObtained results

Figure 2 : Precision obtained for different statistical measures that combine termhood and unithood

Figure 3 : Precision obtained for the C/NC-value andthe NTC-value

Figure 4 : Precision obtained for the LLR and theC/NC-value

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 17 / 20

Page 29: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

.. Conclusion and perspectives.Conclusion..

......

...1 Hybrid method for Arabic MWT acquisition, that takes advantage of existinglinguistic and statistical approaches.

...2 Novel statistical measure, NLC-value, that consists of ranking MWT candidates.

...3 Experiments are performed for bi-grams and tri-grams on an environment Arabiccorpus.

.perspectives

..

......

...1 Validate the proposed statistical measure in other language.

...2 Using the extracted MWTs for documents indexing and retrieving in IR systems.

.

......We appreciate the reviewers for their useful comments (the results presented here arebased on their remarks).

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 18 / 20

Page 30: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

.. Conclusion and perspectives.Conclusion..

......

...1 Hybrid method for Arabic MWT acquisition, that takes advantage of existinglinguistic and statistical approaches.

...2 Novel statistical measure, NLC-value, that consists of ranking MWT candidates.

...3 Experiments are performed for bi-grams and tri-grams on an environment Arabiccorpus.

.perspectives

..

......

...1 Validate the proposed statistical measure in other language.

...2 Using the extracted MWTs for documents indexing and retrieving in IR systems.

.

......We appreciate the reviewers for their useful comments (the results presented here arebased on their remarks).

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 18 / 20

Page 31: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 19 / 20

Page 32: A Study of Association Measures and their Combination for Arabic MWT Extraction

IntroductionThe state of MWT extraction

Proposed MethodEvaluation and Results

Conclusion and perspectivesBibliography

.. Bibliography

Boulaknadel S, Daille B, and Aboutajdine D. 2008 a. Multi-word term indexingfor Arabic document retrieval. In Proceedings of the The IEEE symposium onComputers and Communications, pp. 869-873.Dunning T. 1994. Accurate Methods for the Statistics of Surprise andCoincidence, volume 19. Computational Linguistics, pp. 61-74.Frantzi K. T, Ananiadou S, and Tsujii T. 1998. The CValue/NC-Value Methodof Automatic Recognition for Multi-word terms. Journal on Research andAdvanced Technology for Digital Libraries, pp. 115-130.Kageura K, and Umino B.1996, Methods of Automatic Term Recognition AReview,volume 3. Terminology.Vu T, Aw A. Ti, and Zhang M. 2008. Term Extraction Through Unithood AndTermhood Unification. In Procedings of IJCNLP.

A. El Mahdaouy, S.O El Alaoui and E. Gaussier Arabic MWT Extraction 20 / 20