edium: i mproving e ntity d isambiguation via u ser modelling

18
EDIUM: Improving Entity Disambiguation via User modelling ROMIL BANSAL, SANDEEP PANEM, MANISH GUPTA, VASUDEVA VARMA INTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY, HYDERABAD 14 th April 2014

Upload: reese

Post on 25-Feb-2016

48 views

Category:

Documents


0 download

DESCRIPTION

EDIUM: I mproving E ntity D isambiguation via U ser modelling. Romil Bansal, Sandeep Panem , manish gupta , vasudeva Varma International Institute of information technology, hyderabad. 14 th April 2014. Introduction (Entity Disambiguation). - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: EDIUM:  I mproving  E ntity  D isambiguation via  U ser  modelling

EDIUM: Improving Entity Disambiguation via User modellingROMIL BANSAL, SANDEEP PANEM, MANISH GUPTA, VASUDEVA VARMAINTERNATIONAL INSTITUTE OF INFORMATION TECHNOLOGY, HYDERABAD

14th April 2014

Page 2: EDIUM:  I mproving  E ntity  D isambiguation via  U ser  modelling

Introduction (Entity Disambiguation)

Entity Disambiguation is the task of finding the correct entity referent in the knowledge base for the given mention.

Page 3: EDIUM:  I mproving  E ntity  D isambiguation via  U ser  modelling

Introduction (User modelling) User modelling is the task of categorizing users’ activities, so as to customize and adapt the system based on user’s needs.

Tweets by the User @GameOfThrones (Official HBO Game of Thrones TV Series Handle)

Page 4: EDIUM:  I mproving  E ntity  D isambiguation via  U ser  modelling

Motivationa. Short text from social media (e.g. Twitter, Facebook etc.) is an important source of

information.

b. Entities are important for detecting and tracking information shared about various products.

a. Events and locations.b. Reputations about companies and people.c. Movies, Sports etc.

c. Named Entity Detection (NED) is difficult in micro-posts as they lack sufficient context.

d. Entities from user’s previous tweets could help in creating interest models that could further help in disambiguating new entity mentions.

Page 5: EDIUM:  I mproving  E ntity  D isambiguation via  U ser  modelling

Related Work Many models have been proposed to disambiguate entities in the text. Many models [ASMP12, NERT11, EDTL13] tried to disambiguate entities based on the following parameters.

Context Aware Entity Disambiguation◦ Use text around the entity for disambiguation

Popularity based Entity Disambiguation◦ Likelihood of candidate entity being the target for the given mention

We try to disambiguate the entities by combining contextual models and user models by analyzing the user’s tweeting behavior.

Page 6: EDIUM:  I mproving  E ntity  D isambiguation via  U ser  modelling

Problem

Entity Disambiguatio

n

User modelling

Page 7: EDIUM:  I mproving  E ntity  D isambiguation via  U ser  modelling

Our Approach (System Architecture)

α

1-α

Page 8: EDIUM:  I mproving  E ntity  D isambiguation via  U ser  modelling

The EDIUM System Self-learn the user’s interests.

Use existing context-based method for disambiguation.Add highly confident (ratio test, confidence > 90% ) disambiguations from the user’s tweet to create user model.Cluster the interests based on semantic similarity between different entities.

Page 9: EDIUM:  I mproving  E ntity  D isambiguation via  U ser  modelling

The EDIUM SystemCompute the user based disambiguation score [] of candidate entity () based on the semantic similarity with the entity and interest topics ().

Compute the context based disambiguation score [] of the candidate entity from the context-based systems.

Rank the results on the context as well as user model scores.

Select the candidate entity with the maximum score as the final disambiguated entity for the given mention.

Page 10: EDIUM:  I mproving  E ntity  D isambiguation via  U ser  modelling

The EDIUM System Re-calculate the score α based on the similarity of the user’s new tweet’s topics with the previous m tweet topics. This is done to reduce the dependency of user model for entity disambiguation in case the user model is incomplete or user tweets are too general.

Where , is the cosine similarity between tweet categories vector obtained by the system and tweet categories vector by the user model;

and is the cosine similarity between tweet categories vector obtained by the system and tweet categories vector by the contextual model.

Page 11: EDIUM:  I mproving  E ntity  D isambiguation via  U ser  modelling

Dataset We evaluated the performance of EDIUM on a dataset annotated manually by three individuals.

The dataset consists of 200 tweets each from randomly selected 20 different Twitter users.

Page 12: EDIUM:  I mproving  E ntity  D isambiguation via  U ser  modelling

Results Entity Disambiguation

Fig. 1: Performance with Wikipedia Miner

Fig. 2: Performance with DBpedia Spotlight

Page 13: EDIUM:  I mproving  E ntity  D isambiguation via  U ser  modelling

Observations System works better with Wikipedia Miner [WIKIM13] than with DBpedia Spotlight [DSSL11].

System depends on the underlying Contextual modelling system to learn the user’s interests initially.

More precise text contextual systems leads to greater improvement in the desired results.

Page 14: EDIUM:  I mproving  E ntity  D isambiguation via  U ser  modelling

Conclusion In this paper, we have modeled entity disambiguation based on the user’s past interest information.

We proposed a way to model the user’s interests using the entity linking techniques and then using it later to improve the disambiguation in entity linking systems. The gain in precision is proportional to the accuracy of the underlying entity linking system.

Page 15: EDIUM:  I mproving  E ntity  D isambiguation via  U ser  modelling

Future Work Future work requires more analysis on the user modelling aspect of the system. Along with user’s previous tweets, user’s network and demographics information could also be considered for further improve the entity disambiguation.

Page 16: EDIUM:  I mproving  E ntity  D isambiguation via  U ser  modelling

Thank you! Questions?

Page 17: EDIUM:  I mproving  E ntity  D isambiguation via  U ser  modelling

References [RESE13] Murnane, E. L., Haslhofer, B., Lagoze, C.: RESLVE: Leveraging User Interest to Improve Entity Disambiguation on Short Text. In: Proc. of the 22nd Intl. Conf. on World Wide Web (WWW), Republic and Canton of Geneva, Switzerland (2013)

[ASMP12] E. Meij, W. Weerkamp, and M. de Rijke. Adding semantics to microblog posts. In WSDM 2012. ACM, 2012

[ELFT13] X. Liu, Y. Li, H. Wu, M. Zhou, F. Wei, and Y. Lu. 2013. Entity linking for tweets. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics

[NERT11] A. Ritter, S. Clark, Mausam, and O. Etzioni. Named Entity Recognition in Tweets: An Experimental study. In Proc. Of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP), 2011

[DUTI10] Michelson, M., Macskassy, S. A.: Discovering Users’ Topics of Interest on Twitter: A First Look. In: Proc. of the 4th Workshop on Analytics for Noisy Unstructured Text Data, ACM (2010) 73–80

[ABIR10] Q. Wu, C. J. Burges, K. M. Svore, and J. Gao. Adapting Boosting for Information Retrieval Measures. Journal of Information Retrieval, 13(3):254–270, Jun 2010

Page 18: EDIUM:  I mproving  E ntity  D isambiguation via  U ser  modelling

References [DSSL11] Mendes, P. N., Jakob, M., Garc´ıa-Silva, A., Bizer, C.: DBpedia Spotlight: Shedding Light on the Web of Documents. In: Proc. of the 7th Intl. Conf. on Semantic Systems, New York, NY, USA, ACM (2011)

[WIKM13] Milne, D.,Witten, I. H.: An Open-source Toolkit for Mining Wikipedia. Artificial Intelligence 194 (2013) 222–239

[EDTL13] Yerva, S. R., Catasta, M., Demartini, G., Aberer, K.: Entity Disambiguation in Tweets Leveraging User Social Profiles. In: Proc. of the 2013 Intl. Conf. on Information Reuse and Integration (IRI), 2013, IEEE (2013) 120–128