edium: i mproving e ntity d isambiguation via u ser modelling

EDIUM: Improving Entity Disambiguation via User modellingROMIL BANSAL, SANDEEP PANEM, MANISH GUPTA, VASUDEVA VARMAINTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY, HYDERABAD

14th April 2014

Introduction (Entity Disambiguation)

Entity Disambiguation is the task of finding the correct entity referent in the knowledge base for the given mention.

Introduction (User modelling) User modelling is the task of categorizing users’ activities, so as to customize and adapt the system based on user’s needs.

Tweets by the User @GameOfThrones (Official HBO Game of Thrones TV Series Handle)

Motivationa. Short text from social media (e.g. Twitter, Facebook etc.) is an important source of

information.

b. Entities are important for detecting and tracking information shared about various products.

a. Events and locations.b. Reputations about companies and people.c. Movies, Sports etc.

c. Named Entity Detection (NED) is difficult in micro-posts as they lack sufficient context.

d. Entities from user’s previous tweets could help in creating interest models that could further help in disambiguating new entity mentions.

Related Work Many models have been proposed to disambiguate entities in the text. Many models [ASMP12, NERT11, EDTL13] tried to disambiguate entities based on the following parameters.

Context Aware Entity Disambiguation◦ Use text around the entity for disambiguation

Popularity based Entity Disambiguation◦ Likelihood of candidate entity being the target for the given mention

We try to disambiguate the entities by combining contextual models and user models by analyzing the user’s tweeting behavior.

Problem

Entity Disambiguatio

n

User modelling

Our Approach (System Architecture)

α

1-α

The EDIUM System Self-learn the user’s interests.

Use existing context-based method for disambiguation.Add highly confident (ratio test, confidence > 90% ) disambiguations from the user’s tweet to create user model.Cluster the interests based on semantic similarity between different entities.

The EDIUM SystemCompute the user based disambiguation score [] of candidate entity () based on the semantic similarity with the entity and interest topics ().

Compute the context based disambiguation score [] of the candidate entity from the context-based systems.

Rank the results on the context as well as user model scores.

Select the candidate entity with the maximum score as the final disambiguated entity for the given mention.

The EDIUM System Re-calculate the score α based on the similarity of the user’s new tweet’s topics with the previous m tweet topics. This is done to reduce the dependency of user model for entity disambiguation in case the user model is incomplete or user tweets are too general.

Where , is the cosine similarity between tweet categories vector obtained by the system and tweet categories vector by the user model;

and is the cosine similarity between tweet categories vector obtained by the system and tweet categories vector by the contextual model.

Dataset We evaluated the performance of EDIUM on a dataset annotated manually by three individuals.

The dataset consists of 200 tweets each from randomly selected 20 different Twitter users.

Results Entity Disambiguation

Fig. 1: Performance with Wikipedia Miner

Fig. 2: Performance with DBpedia Spotlight

Observations System works better with Wikipedia Miner [WIKIM13] than with DBpedia Spotlight [DSSL11].

System depends on the underlying Contextual modelling system to learn the user’s interests initially.

More precise text contextual systems leads to greater improvement in the desired results.

Conclusion In this paper, we have modeled entity disambiguation based on the user’s past interest information.

We proposed a way to model the user’s interests using the entity linking techniques and then using it later to improve the disambiguation in entity linking systems. The gain in precision is proportional to the accuracy of the underlying entity linking system.

Future Work Future work requires more analysis on the user modelling aspect of the system. Along with user’s previous tweets, user’s network and demographics information could also be considered for further improve the entity disambiguation.

Thank you! Questions?

References [RESE13] Murnane, E. L., Haslhofer, B., Lagoze, C.: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text. In: Proc. of the 22nd Intl. Conf. on World Wide Web (WWW), Republic and Canton of Geneva, Switzerland (2013)

[ASMP12] E. Meij, W. Weerkamp, and M. de Rijke. Adding semantics to microblog posts. In WSDM 2012. ACM, 2012

[ELFT13] X. Liu, Y. Li, H. Wu, M. Zhou, F. Wei, and Y. Lu. 2013. Entity linking for tweets. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics

[NERT11] A. Ritter, S. Clark, Mausam, and O. Etzioni. Named Entity Recognition in Tweets: An Experimental study. In Proc. Of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2011

[DUTI10] Michelson, M., Macskassy, S. A.: Discovering Users’ Topics of Interest on Twitter: A First Look. In: Proc. of the 4th Workshop on Analytics for Noisy Unstructured Text Data, ACM (2010) 73–80

[ABIR10] Q. Wu, C. J. Burges, K. M. Svore, and J. Gao. Adapting Boosting for Information Retrieval Measures. Journal of Information Retrieval, 13(3):254–270, Jun 2010

References [DSSL11] Mendes, P. N., Jakob, M., Garc´ıa-Silva, A., Bizer, C.: DBpedia Spotlight: Shedding Light on the Web of Documents. In: Proc. of the 7th Intl. Conf. on Semantic Systems, New York, NY, USA, ACM (2011)

[WIKM13] Milne, D.,Witten, I. H.: An Open-source Toolkit for Mining Wikipedia. Artificial Intelligence 194 (2013) 222–239

[EDTL13] Yerva, S. R., Catasta, M., Demartini, G., Aberer, K.: Entity Disambiguation in Tweets Leveraging User Social Profiles. In: Proc. of the 2013 Intl. Conf. on Information Reuse and Integration (IRI), 2013, IEEE (2013) 120–128

edium: i mproving e ntity d isambiguation via u ser modelling

Documents

underlying entity

users interests

users network

users past

user models

users previous tweets

entity detection ned

correct entity referent