coreference recognition in arabic

22
Coreference Recognition in Arabic Alshunabier,Atheer Aldakheel,Bushra Supervisor : Alsaif,Amal

Upload: arabicnlpimamu2013

Post on 16-Jan-2015

326 views

Category:

Technology


3 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Coreference recognition in arabic

Coreference Recognition in

Arabic

Alshunabier,AtheerAldakheel,Bushra

Supervisor :Alsaif,Amal

Page 2: Coreference recognition in arabic

Overview

Page 3: Coreference recognition in arabic

Introduction Characteristics of the Arabic Language Related work Methodology Discussion Conclusion

Agenda

Page 4: Coreference recognition in arabic

In linguistics, co-reference occurs when multiple expressions in a sentence or document refer to the same thing.For example,

"Mary said she would help me"

Introduction

Page 5: Coreference recognition in arabic

Introduction Characteristics of the Arabic Language Related work Methodology Discussion Conclusion

Agenda

Page 6: Coreference recognition in arabic

The Arabic language differs greatly from the

English language and other Germanic and Latin-based languages. There are certain grammatical differences you must know before you begin to understand the language.

Arabic is a synthetic language as opposed to an analytical one.

Arabic language trait is to leave out short vowels. difficult to read Arabic unless you have a vast knowledge of the written language.

Characteristics of the Arabic Language

Page 7: Coreference recognition in arabic

Use of multiple consonants attached to a root verb to create a different meaning..

In English : In Arabic :

“He wrote“ "Aktaba.“ “He dictated"

Tri-Consonantal Root Verb

Page 8: Coreference recognition in arabic

To determine the person who performs an action.

In English : In Arabic :

"drink" "sh-r-b"

"drinker" "sharib"

Active Participle

Page 9: Coreference recognition in arabic

Introduction Characteristics of the Arabic Language Related work Methodology Discussion Conclusion

Agenda

Page 10: Coreference recognition in arabic

Related work

Kehler (1997) used maximum entropy modeling to assign aprobability distribution to alternative sets of coreference relationships among noun phrase entity templates, whereas we used decision tree learning.

Ge, Hale, and Charniak (1998) used a statistical model for resolving pronouns, where as we used a decision tree learning algorithm and resolved general noun phrases, not just pronouns.

the work of Cardie and Wagstaff (1999) also falls under the machine learning approach.

Page 11: Coreference recognition in arabic

Introduction Characteristics of the Arabic Language Related work Methodology Discussion Conclusion

Agenda

Page 12: Coreference recognition in arabic

MT-based stemmer

Partition the Arabic words into clusters based on the English translations of the Arabic words. The Arabic words whose English translations, after removing English stopwords.

Page 13: Coreference recognition in arabic

MT-based stemmer

Page 14: Coreference recognition in arabic

Introduction Characteristics of the Arabic Language Related work Methodology Discussion Conclusion

Agenda

Page 15: Coreference recognition in arabic

Coreference Resolution

Coreference resolution is the process of determining whether two expressions in natural language refer to the same entity in the world.

Page 16: Coreference recognition in arabic

Anaphora

Anaphora is a linguistic relation between two textual entities .

Page 17: Coreference recognition in arabic

Pronominal anaphora in Quran

Page 18: Coreference recognition in arabic

Why is Arabic Information Extraction difficult?

The Arabic alphabet consists of 28 letters that can be extended to 90 by additional shapes, marks, and vowels.

There are two genders (masculine and feminine), three numbers (singular, dual, and plural), and three grammatical cases (nominative, genitive, and accusative)

Page 19: Coreference recognition in arabic

Annotated Corpora

The number of corpora annotated both anaphorically and coreferentially have increased.

For English, there are some resources such as the Lancaster Anaphoric Treebank and other .

For Arabic , no expansion in the field of anaphorical or coreferential corpus annotation .

Page 20: Coreference recognition in arabic

Annotating Tools

The annotation task of anaphoric or coreferential relations require a considerable effort from the human annotator.

such as: .Callisto3,MMAX2 and PALinkA These tools are written in Java

Page 21: Coreference recognition in arabic

Introduction Characteristics of the Arabic Language Methodology Discussion Conclusion

Agenda

Page 22: Coreference recognition in arabic

Conclusion