the named entity recognition (ner)2
TRANSCRIPT
The Named Entity Recognition (NER)
• Al-Shehri ,Aisha• Almutairi ,Shaikhah• Alswelim ,Haya
KINGDOM OF SAUDI ARABIA Ministry of Higher Education Al-Imam Muhammad Ibn Saud Islamic University College of Computer and Information Sciences
Abstract
Name Entity Recognition is an important part of many natural language processing tasks .
There are different type of name entity such as people , location and organization .
Introduction • The Named Entity Recognition is the identification and
classification of Named Entities within an open-domain text.
• The task of named entity recognition was defined as three subtasks:
• ENAMEX.• TIMEX, and NUMEX.
• We present the attempt at the recognition andextraction of the most important proper name entity, that is, the person name, for the Arabic language(PERA).
Components of an Arabic Full Name:divided into five main categories, Ibn Auda (2003):1. An ism (pronounced IZM).2. A kunya (pronounced COON-yah).3. By a nasab (pronounced NAH-sahb).4. A laqab (pronounced LAH-kahb).5. A nisba (pronounced NISS-bah).
Methodology
1-Parallel Corpora . a-Reliability b-Representativeness2-Previously developed tools for other languages . a-Person names b-Location names (Geographical locations and Toponyms) c-Organizations (Political of Administrative Entities) d-Position (job titles) e-Acronyms
Challenges • 1- There is no capital letters or a specific signal in the
orthography like many other language.
• 2-The Arabic has different meaning
• 3-Abiguity
Ambiguous exampleexampleCorrect Incorrect English
translationAmbiguous example
Date Person 15th of Ramadan Al karim 2005
Company Location Saudi Aramco
Features
• Machine-learning features Word-Length.
• Noun-Flag
• Speech-Tag
• Type-Current
• Type-Left.
• Type-Right.
System Architecture and Implementation
• Architecture of the NERA System:
System Architecture and Implementation
• Gazetteers.
• Grammar.
• Filter.
System Architecture and Implementation
1)Gazetteers:
Gazetteer containing: lists of known named entities.
White list:The White list plays the role of fixed static dictionaries ofvarious NE.
System Architecture and Implementation
2) Grammar:The grammar performs recognition and extraction of Arabicnamed entities from the input text based on derived rules.
The following are examples of indicators used within rules:
• Job title:الدكتورة (the doctor), العلوم the sciences)أستاذprofessor).
• Person title: (Mr.) السيدة, .(.Mrs)السيد
System Architecture and Implementation
3) Filter:filter rules hels in dealing with recognitionambiguity between named entities.
filtration mechanism is used that serves two different purposes:revision of the NE extractor results and disambiguation
of matches returned by different NE extractors.
Example:variationTypographic
Entity type English translation
Arabic example
Two dots removed from taa marbouta
Location Saudi Arabia
Drop of the letter madda from the aleph
Location Asia
The Experiment
Results
Conclusion • 1-We tried in the majority of cases to follow more general
criteria, applicable on English-Arabic transliteration or French-Arabic transliteration.
• 2-This work is part of a new system for Arabic NER. It has several ongoing activities.
References• Sherief Abdallah, Khaled Shaalan, and Muhammad Shoaib ,
Integrating Rule-Based System with Classification for Arabic Named Entity Recognition, 2012
• Yassine Benajiba , Mona Diab , and Paolo Rosso ,Using Language Independent and Language Specific Features to Enhance Arabic Named Entity Recognition, 2009
• Yassine Benajiba , Mona Diab , and Paolo Rosso , Arabic Named Entity Recognition: AN SVM-BASED APPROACH, 2009
• Doaa Samy, Antonio Moreno, and José Mª Guirao, A Proposal For An Arabic Named Entity Tagger Leveraging aParallel Corpus,2005
• Khaled Shaalan, Hafsa Raza, Person Name Entity Recognition for Arabic,2009