mining named entities -iiith

Download Mining named entities -IIITH

Post on 19-Jun-2015




0 download

Embed Size (px)


Mining Named Entities and linking them based on wikipedia data


  • 1. Mining Named Entities Team No: 35 Gaurav Agrawal (201305574) Mayank Natani (201101114) R.Amit(201205680) Rashmi Sharma (201250855)

2. Concepts Our goal is to identify named entities and classify them based on the category (such as person, location and organization) after removing the disambiguation. While parsing the document to identify the text and label entities, It is highly desirable to resolve the semantic disambiguity to identify the right named entity. We have separated the system in two steps Candidate generation and Candidate disambiguation. For candidate generation as well as disambiguation, we make use of a recall- oriented retrieval model. SIEL LAB, IIIT HYDERABAD 3. Approach Information Extraction from Wikipedia 1. Entity Mappings 2. Entity Identification 3. Context First two indexes will be used to identify and linking named entities and Context index will be used in case of disambiguation. SIEL LAB, IIIT HYDERABAD 4. Assumptions Wikipedia has categorized the entities correctly. Data retrieved from Wikipedia will be sufficient enough for identifying named entities from news articles. Named entities which are not present in Wikipedia will not be classified and linked. While finding the context if the top words are not present in the news articles, then system will not be able to link/classify the named entity. SIEL LAB, IIIT HYDERABAD 5. Architectural Diagram SIEL LAB, IIIT HYDERABAD 6. Document Analysis Identify words after removing stop words Identify named entities and link them based on entity and title index. Create temp index for already derived named entities for disambiguation. Use context index to disambiguate. SIEL LAB, IIIT HYDERABAD 7. Entity Linking Flow SIEL LAB, IIIT HYDERABAD 8. Evaluation and Results Employed both manual and against existing Stanford NER tool. Precision and recall was above 85%. Returned the main Wikipedia page for most of the recognized entities. SIEL LAB, IIIT HYDERABAD 9. Conclusion Our initial experiments suggest that our system which is based on the Wikipedia data coupled with techniques to identify Named Entities and then link them, has performed exceptionally well. Further techniques can be integrated ( such as NLTK etc) to make the system perform better. SIEL LAB, IIIT HYDERABAD 10. References Yunbo Cao, Chin-Yew Lin & Guoqing Zheng. MSRA at TAC 2011: Entity Linking. 2011. TAC Publications. Xianpei Han & Le Sun. 2011. A Generative Entity-Mention Model for Linking Entities with Knowledge Base. HLT '11 Dave Orr, Amar Subramanya, and Fernando Pereira.Learning from Big Data: 40 Million Entities in Context. 2013. Tao Zhang, Kang Liu, and Jun Zhao. The NLPR_TAC. Entity Linking System at TAC 2011. TAC Publications Extracting Information from Text, Lev Ratinov, Dan Roth, Design Challenges and Misconceptions in Named Entity Recognition, Exploiting Wikipedia as external knowledge for named entity recognition, J Kazama, K Torisawa - Joint Conference on Empirical Methods, 2007 SIEL LAB, IIIT HYDERABAD


View more >