Transcript
Page 1: Mining named entities -IIITH

Mining Named Entities

Team No: 35

Gaurav Agrawal (201305574)

Mayank Natani (201101114)

R.Amit(201205680)

Rashmi Sharma (201250855)

Page 2: Mining named entities -IIITH

Concepts

Our goal is to identify named entities and classify them based on the category (such as person, location and organization) after removing the disambiguation. While parsing the document to identify the text and label entities, It is highly desirable to resolve the semantic disambiguity to identify the right named entity. We have separated the system in two steps – Candidate generation and Candidate disambiguation. For candidate generation as well as disambiguation, we make use of a recall-oriented retrieval model.

SIEL LAB, IIIT HYDERABAD

Page 3: Mining named entities -IIITH

Approach

• Information Extraction from Wikipedia

1. Entity Mappings

2. Entity Identification

3. Context

First two indexes will be used to identify and linking named entities and Context index will be used in case of disambiguation.

SIEL LAB, IIIT HYDERABAD

Page 4: Mining named entities -IIITH

Assumptions

• Wikipedia has categorized the entities correctly.

• Data retrieved from Wikipedia will be sufficient enough for identifying named entities from news articles.

• Named entities which are not present in Wikipedia will not be classified and linked.

• While finding the context if the top words are not present in the news articles, then system will not be able to link/classify the named entity.

SIEL LAB, IIIT HYDERABAD

Page 5: Mining named entities -IIITH

Architectural Diagram

SIEL LAB, IIIT HYDERABAD

Page 6: Mining named entities -IIITH

Document Analysis

• Identify words after removing stop words

• Identify named entities and link them based on entity and title index.

• Create temp index for already derived named entities for disambiguation.

• Use context index to disambiguate.

SIEL LAB, IIIT HYDERABAD

Page 7: Mining named entities -IIITH

Entity Linking Flow

SIEL LAB, IIIT HYDERABAD

Page 8: Mining named entities -IIITH

Evaluation and Results

• Employed both manual and against existing Stanford NER tool.

• Precision and recall was above 85%.

• Returned the main Wikipedia page for most of the recognized entities.

SIEL LAB, IIIT HYDERABAD

Page 9: Mining named entities -IIITH

Conclusion

Our initial experiments suggest that our system which is

based on the Wikipedia data coupled with techniques to identify Named Entities and then link them, has performed exceptionally well. Further techniques can be integrated ( such as NLTK etc) to make the system perform better.

SIEL LAB, IIIT HYDERABAD

Page 10: Mining named entities -IIITH

References

• Yunbo Cao, Chin-Yew Lin & Guoqing Zheng. MSRA at TAC 2011: Entity Linking. 2011. TAC Publications.

• Xianpei Han & Le Sun. 2011. A Generative Entity-Mention Model for Linking Entities with Knowledge Base. HLT '11

• Dave Orr, Amar Subramanya, and Fernando Pereira.Learning from Big Data: 40 Million Entities in Context. 2013.

• Tao Zhang, Kang Liu, and Jun Zhao. The NLPR_TAC. Entity Linking System at TAC 2011. TAC Publications

• Extracting Information from Text, http://www.nltk.org/book/ch07.html

• Lev Ratinov, Dan Roth, “Design Challenges and Misconceptions in Named Entity Recognition,”

• “Exploiting Wikipedia as external knowledge for named entity recognition,” J Kazama, K Torisawa - Joint Conference on Empirical Methods, 2007

SIEL LAB, IIIT HYDERABAD


Top Related