dynamic collective entity representations for ... - graus.nu › blog › wp-content › uploads ›...

1
Dynamic Collective Entity Representations for Entity Ranking WSDM ‘16 - San Francisco, CA, USA David Graus University of Amsterdam Edgar Meij Yahoo Labs Manos Tsagkias 904Labs Wouter Weerkamp 904Labs Maarten de Rijke University of Amsterdam [email protected] @dvdgrs www.graus.nu Dynamic expansions tupac and the law hiphop/icons dead rappers people influenced by tupac awesomeartist rapd Happy Birthday Tupac!!! 2Pac Gemini RT: Las cenizas de Tupac, el mejor rapero de la historia,- fueron mezcladas con marihuana y fumadas por miembros de Outlawz Even more crazy that this was an- nounced just one day before what would have been Pac’s 40th birth- day. Tweets Tags Queries KB Anchors 2Pac Tupac Makaveli KB Linked entities The Notorious B.I.G. Black Panther Party Muammar Gaddafi KB Redirects 2pac Shakur Thug Immortal KB Categories Murdered Rappers Death Row Record Artists American deists Web Anchors What job did Tupac have before he was a rapper Tupac Tupac is arguably more influential Tupac Amaru Shakur Tupac Shakur-style drive-by shooting Tupac Shakur Tupac Shakur reciting Shake- speare at art school Internal expansions External expansions Description sources KB Wikipedia dump (Aug ‘14) 57M descriptions for 4.8M entities. Web anchors Anchors from Google WikiLinks corpus. 9.8M descriptions for 876,063 entities. Tweets Tweets w/ links to Wikipedia pages (2011-2014) 52,631 descriptions for 38,269 entities. Queries Queries from MSN query logs that yield Wikipedia clicks. 47,002 descriptions for 18,724 entities. Social tags Delicious tags for Wiki pages, from the SocialBM0311 corpus. 4.4M descriptions for 289,015 entities. Dynamic sources Static sources Static expansions Tupac Shakur Tupac Amaru Shakur (Previously known as Lesane Parish Crooks)(too-pahk shə-koor;[1] June 16, 1971 – Septem- ber 13, 1996), also known by his stage names 2Pac and (briefly) Makaveli, was an American rapper, author, actor, and poet.[2] As of 2007, Shakur has sold over 75 million records worldwide, making him one of the best-selling music artists of all time.[3] His double disc albums All Eyez on Me and his Greatest Hits are among the [...] Original entity description Entity description Results (cumulative average) MAP on the y-axis (over 5 folds), number of queries on the x-axis. Performance with different description sources (relative) feature weights on y-axis, each chunk of (500) queries where the ranker is retrained on x-axis Field weights over time Adaptive ranking Features Given: query q, retrieve top-k candidate entities, extract: 1. Field-similarity features For each field in entity representation: - TFxIDF similarity between q and field 2. Field importance features For each field, compute scores that reflects its importance: - Length of field (chars/words) - Number of updates to field - Terms in fields not seen in original descr. (novel content) 3. Entity importance features For each entity, compute a score that reflects its importance: - Time since last update Supervised single-field weighting model Eeach field’s contribution towards the final score is individually weighted, learned from clicks at set intervals. Dynamic Collective Entity Representations| Task and motivation http://graus.nu/dcer-paper#abstract Entity ranking, i.e., successfully positioning a relevant entity at the top of the ranking for a given query, is inherently diffiicult due to the potential mismatch between the entity's description in a knowledge base, and the way people refer to the entity when searching for it. Collective entity representations http://graus.nu/dcer-paper#step1 We collect entity descriptions from a variety of sources and combine them into a single entity representation by learning to weight the content from different sources that is associated with an entity for optimal retrieval effectiveness. Adaptive Ranking http://graus.nu/dcer-paper#step2 We represent entities as fielded documents, where each field has content from a single description source. When new entity descriptions come in as a stream, learning weights for the fields in batch is not optimal. Dynamic entity representations boil down to dynamically learning to optimally weight the fields with content from different description sources. We exploit clicks to retrain our model and continually adjust the weights associated to the fields. Abstract

Upload: others

Post on 07-Jul-2020

1 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Dynamic Collective Entity Representations for ... - graus.nu › blog › wp-content › uploads › wsdm16-dcer-poster.pdf · David Graus University of Amsterdam Edgar Meij Yahoo

Dynamic Collective Entity Representations for Entity Ranking

WSDM ‘16 - San Francisco, CA, USA

David GrausUniversity of Amsterdam

Edgar MeijYahoo Labs

Manos Tsagkias904Labs

Wouter Weerkamp904Labs

Maarten de RijkeUniversity of Amsterdam

[email protected]@dvdgrswww.graus.nu

Dynamic expansions

tupac and the law

hiphop/icons

dead rappers

people influenced by tupac

awesomeartist rapd

Happy Birthday Tupac!!! 2Pac Gemini

RT: Las cenizas de Tupac, el mejor rapero de la historia,-fueron mezcladas con marihuana y fumadas por miembros de Outlawz

Even more crazy that this was an-nounced just one day before what would have been Pac’s 40th birth-day.

Tweets TagsQueriesKB Anchors2PacTupacMakaveli

KB Linked entitiesThe Notorious B.I.G.Black Panther PartyMuammar Gaddafi

KB Redirects2pac ShakurThug Immortal

KB CategoriesMurdered RappersDeath Row Record ArtistsAmerican deists

Web AnchorsWhat job did Tupac have before he was a rapper

Tupac

Tupac is arguably more influential

Tupac Amaru Shakur

Tupac Shakur-style drive-by shooting

Tupac Shakur

Tupac Shakur reciting Shake-speare at art school

Internal expansions External expansions

Description sources

KBWikipedia dump (Aug ‘14)57M descriptions for 4.8M entities.

Web anchorsAnchors from Google WikiLinks corpus.9.8M descriptions for 876,063 entities.

TweetsTweets w/ links to Wikipedia pages (2011-2014)52,631 descriptions for 38,269 entities.

QueriesQueries from MSN query logs that yield Wikipedia clicks.47,002 descriptions for 18,724 entities.

Social tagsDelicious tags for Wiki pages, from the SocialBM0311 corpus.4.4M descriptions for 289,015 entities.

Dynamic sources

Static sources

Static expansions

Tupac ShakurTupac Amaru Shakur (Previously known as Lesane Parish Crooks)(too-pahk shə-koor;[1] June 16, 1971 – Septem-ber 13, 1996), also known by his stage names 2Pac and (briefly) Makaveli, was an American rapper, author,

actor, and poet.[2] As of 2007, Shakur has sold over 75 million records worldwide, making him one of the best-selling music artists of all time.[3] His double disc albums All Eyez on Me and his Greatest Hits are among the [...]

Original entity description

Entity description

Results

(cumulative average) MAP on the y-axis (over 5 folds), number of queries on the x-axis.

Performance with di�erent description sources

(relative) feature weights on y-axis, each chunk of (500) queries where the ranker is retrained on x-axis

Field weights over time

Adaptive rankingFeatures

Given: query q, retrieve top-k candidate entities, extract:

1. Field-similarity featuresFor each �eld in entity representation:- TFxIDF similarity between q and �eld

2. Field importance featuresFor each �eld, compute scores that re�ects its importance: - Length of �eld (chars/words)- Number of updates to �eld- Terms in �elds not seen in original descr. (novel content)

3. Entity importance featuresFor each entity, compute a score that re�ects its importance:- Time since last update

Supervised single-�eld weighting modelEeach �eld’s contribution towards the �nal score is individually weighted, learned from clicks at set intervals.

Dynamic Collective Entity Representations|

Task and motivationhttp://graus.nu/dcer-paper#abstractEntity ranking, i.e., successfully positioning a relevant entity at the top of the ranking for a given query, is inherently di�icult due to the potential mismatch between the entity's description in a knowledge base, and the way people refer to the entity when searching for it.

Collective entity representationshttp://graus.nu/dcer-paper#step1We collect entity descriptions from a variety of sources and combine them into a single entity representation by learning to weight the content from di�erent sources that is associated with an entity for optimal retrieval e�ectiveness.

Adaptive Rankinghttp://graus.nu/dcer-paper#step2We represent entities as �elded documents, where each �eld has content from a single description source. When new entity descriptions come in as a stream, learning weights for the �elds in batch is not optimal. Dynamic entity representations boil down to dynamically learning to optimally weight the �elds with content from di�erent description sources. We exploit clicks to retrain our model and continually adjust the weights associated to the �elds.

Abstract