entity representation and...

66
Entity Representation and Retrieval Laura Dietz University of New Hampshire Alexander Kotov Wayne State University Edgar Meij Bloomberg ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Upload: others

Post on 11-Jul-2020

34 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Entity Representation and RetrievalLaura Dietz University of New Hampshire

Alexander Kotov Wayne State University

Edgar Meij Bloomberg

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 2: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Knowledge Graphs

I A way to represent human knowledgein machine readable way

I Subjects correspond to entitiesdesignated by an identifier (URI http://dbpedia.org/page/Barack_Obama

in case of DBpedia)

I Entities are connected with otherentities, literals or scalars by relationsor predicates (e.g. hasGenre,knownFor, marriedTo, isPCmemberOfetc.)

I Each triple represents a simple fact(e.g. <http://dbpedia.org/page/Barack_Obama, marriedTo,http://dbpedia.org/page/

Michelle_Obama>)

I Many SPO triples → knowledge graph

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 3: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Entity Retrieval from Knowledge Graph(s) (ERKG) (1)

I Users often search for specific material or abstract entities (objects),such as people, products or locations, instead of documents thatmerely mention them

I Answers are names of entities (or entity representations) rather thanarticles discussing them

I Users are willing to express their information need more elaboratelythan with a few keywords [Balog et al. 2008]

I Knowledge graphs are perfectly suited for addressing theseinformation needs

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 4: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Entity Retrieval from Knowledge Graph(s) (ERKG) (2)

I Assumes keyword queries (structured queries are studied more in theDB community)

I Different from entity linking, where the goal is to identify whichentities a searcher refers to in her query (part 1)

I Different from ad hoc entity retrieval, which is focused on retrievingentities embedded in documents and using knowledge bases toimprove document retrieval (part 3)

I Unique IR problem: there is no notion of a document

I Challenging IR problem: knowledge graphs are designed forgraph-pattern queries and performing automated reasoning

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 5: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Typical ERKG tasks

I Entity Search: simple queries aimed at finding a particular entity oran entity which is an attribute of another entity

I “Ben Franklin”I “Einstein Relativity theory”I “England football player highest paid”

I List Search: descriptive queries with several relevant entities

I “US presidents since 1960”I “animals lay eggs mammals”I “Formula 1 drivers that won the Monaco Grand Prix”

I Question Answering: queries are questions in natural language

I “Who founded Intel?”I “For which label did Elvis record his first album?”

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 6: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Research challenges in ERKG

ERKG requires accurate interpretation of unstructured textual queriesand matching them with structured entity semantics:

1. How to design entity representations that capture the semantics ofentity properties/relations and are effective for entity retrieval?

2. How to develop accurate and efficient entity retrieval models?

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 7: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Outline

I Entity representation

I Entity retrieval

I Entity ranking

I Entities and documents

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 8: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

From Entity Graph to Entity Documents

Build a textual representation (i.e. “document”) for each entity byconsidering all triples, where it stands as a subject (or object)

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 9: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Structured Entity Documents (1)

I Entity descriptions are naturally structured, entities can berepresented as fielded documents

I In the simplest case, each predicate corresponds to one documentfield

I However, there are infinitely many predicates → optimization of fieldimportance weights is computationally intractable

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 10: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Structured Entity Documents (2)

Predicate folding: group predicates together into a small set ofpredefined categories → entity documents with smaller number of fields

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 11: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Predicate Folding

I Grouping according to type (attributes, incoming/outgoinglinks)[Perez-Aguera et al. 2010]

I Grouping according to importance (determined based on predicatepopularity)[Blanco et al. 2010]

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 12: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

2-field Entity Document[Neumayer, Balog et al., ECIR’12]

Each entity is represented as a two-field document:

titleobject values belonging to predicates ending with “name”,“label” or “title”

contentobject values for 1000 most frequent predicatesconcatenated together into a flat text representation

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 13: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

3-field Entity Document[Zhiltsov and Agichtein, CIKM’13]

Each entity is represented as a three-field document:

namesliterals of foaf:name, rdfs:label predicates along withtokens extracted from entity URIs

attributesliterals of all other predicates

outgoing linksnames of entities in the object position

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 14: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

5-field Entity Document[Zhiltsov, Kotov et al., SIGIR’15]

Each entity is represented as a five-field document:

namesconventional names of entities, such as the name of aperson or the name of an organization

attributesall entity properties, other than names

categoriesclasses or groups, to which the entity has been assigned

similar entity namesnames of the entities that are very similar or identical to agiven entity

related entity namesnames of entities in the object position

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 15: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

5-field Entity Document Example

Entity document for the DBpedia entity Barack Obama.

Field Content

names barack obama barack hussein obama iiattributes 44th current president united states

birth place honolulu hawaiicategories democratic party united states senator

nobel peace prize laureate christiansimilar entity names barack obama jr barak hussein obama

barack h obama iirelated entity names spouse michelle obama illinois state

predecessor george walker bush

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 16: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Hierarchical Entity Model[Neumayer, Balog et al., ECIR’12]

Entity document fields are organized into a 2-level hierarchy:

I Predicate types are on the top level:

namesubject is E , object is literal and predicate comesfrom a predefined list (e.g. foaf:name orrdfs:label) or ends with “name”, “label” or “title”

attributesthe subject is E , object is literal and the predicate isnot of type name

outgoing linksthe subject is E and the object is a URI. URI isresolved by replacing it with entity name

incoming linksE is an object, subject entity URI is resolved

I Individual predicates are at the bottom level

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 17: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Dynamic Entity Representation[Graus, Tsagkias et al., WSDM’16]

I Problem: vocabulary mismatch between entity’s description in aknowledge base and the way people refer to the entity whensearching for it

I Entity representations should account for:I Context: entities can appear in different contexts (e.g. Germany

should be returned for queries related to World War II and 2014Soccer World Cup)

I Time: entities are not static in how they are perceived (e.g.Ferguson, Missouri before and after August 2014)

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 18: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Approach (1)

Leverage collective intelligence provided by different entity descriptionsources (KBs, web anchors, tweets, social tags, query log) to fill in the“vocabulary gap”:

I Create and update entity representations based on different sources

I Combine different entity descriptions for retrieval at specific timeintervals by dynamically assigning weights to different sources

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 19: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Approach (2)

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 20: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Dynamic Entity Representation

Represent entities as fielded documents, in which each field correspondsto the content that comes from one description source:

I Knowledge base: anchor text of inter-knowledge base hyperlinks,redirects, category titles, names of entities that are linked from andto each entity in Wikipedia

I Web anchors: anchor text of links to Wikipedia pages from GoogleWikilinks corpus

I Twitter: all English tweets that contain links to Wikipedia pagesrepresenting entities in the used snapshot

I Delicious: tags associated with Wikipedia pages in SocialBM0311dataset

I Queries: queries that result in clicks on Wikipedia pages in the usedsnapshot

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 21: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Entity Updates

The fields of entity document:

e = {f etitle , f etext , f eanchors , . . . , f equery}

are updated at each discretized time point T = {t1, t2, t3, . . . , tn}

f equery (ti ) = f equery (ti−1) +

{q, if eclicked

0, otherwise

f etweets(ti ) = f etweets(ti−1) + tweete

f etags(ti ) = f etags(ti−1) + tag e

Each field’s contribution towards the final entity score is determinedbased on features

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 22: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Features

I Field similarity: TF-IDF cosine similarity of query and field f attime ti

I Field importance (favor fields with more novel content): field’slength in terms; field’s length in characters; field’s novelty at time ti(favor fields with unseen, newly associated terms); number ofupdates to the field from t0 through t1

I Entity importance (favor recently updated entities): time since thelast entity update

Classification-based ranker supervised by clicks learns the optimal featureweights

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 23: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Results

(a) adaptive runs (b) non-adaptive runs

I Social tags are the best performing single entity description source

I KB+queries yields substantial relative improvement → addedqueries provide a strong signal for ranking the clicked entities

I Rankers that incorporate dynamic description sources (i.e KB+tags,KB+tweets and KB+queries) show the highest learning rate →entity content from these sources accounts for changes in entityrepresentations over time

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 24: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Architecture of ERKG Methods[Tonon, Demartini et al., SIGIR’12]

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 25: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Results Expansion Strategy

1. Retrieve an initial list of entities matching the query using standardretrieval function (BM25)

2. Expand the retrieved results by exploiting the structure of theknowledge graph (retrieved entities can be used as starting pointsfor simple graph traversals, i.e. finding neighbors)

3. Filter out expanded results removing those with low similarity to theoriginal query

4. Re-rank the resultsICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 26: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Result Expansion Strategies

I Follow predicates leading toother entities

I Follow datatype propertiesleading to additional entityattributes

I Explore just the neighborhoodof a node and the neighbors ofneighbors

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 27: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Predicates to Follow

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 28: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Results

I The simple S1 1 approach which exploits <owl:sameAs> links plusWikipedia redirect and disambiguation information performs bestobtaining 25% improvement of MAP over the BM25 baseline on the2010 datatset

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 29: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Setting Field Weights

I Structured entity documents can be retrieved using structureddocument retrieval models (B25F, MLM)

I Problem: how to set the weights of document fields?I Heuristically: proportionate to the length of content in the fieldI Empirically: by optimizing the target retrieval metric using training

queries

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 30: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Fielded Sequential Dependence Model[Zhiltsov, Kotov et al., SIGIR’15]

Previous research in ad-hoc IR has focused on two major directions:

I unigram bag-of-words retrieval models for multi-fielded documents

• Ogilvie and Callan. Combining Document Representations forKnown-item Search, SIGIR’03 (MLM)

• Robertson et al. Simple BM25 Extension to Multiple WeightedFields, CIKM’04 (BM25F)

I retrieval models incorporating term dependencies

• Metzler and Croft. A Markov Random Field Model for TermDependencies, SIGIR’05 (SDM)

Goal: to develop a retrieval model that captures both documentstructure and term dependencies

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 31: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Sequential and Full Dependence Models[Metzler and Croft, SIGIR’05]

Ranks w.r.t. PΛ(D|Q) =∑

i∈{T ,U,O} λi fi (Q,D)Potential function for unigrams is QL:

fT (qi ,D) = logP(qi |θD) = logtfqi ,D + µ

cfqi|C |

|D|+ µ

SDM only considers two-word sequences in queries, FDM considers alltwo-word combinations.

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 32: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

FSDM ranking function

FSDM incorporates document structure and term dependencies with thefollowing ranking function:

PΛ(D|Q)rank= λT

∑q∈Q

fT (qi ,D) +

λO∑q∈Q

fO(qi , qi+1,D) +

λU∑q∈Q

fU(qi , qi+1,D)

Separate MLMs for bigrams and unigrams give FSDM the flexibility toadjust the document scoring depending on the query type

MLM is a special case of FSDM, when λT = 1, λO = 0, λU = 0

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 33: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

FSDM ranking function

FSDM incorporates document structure and term dependencies with thefollowing ranking function:

PΛ(D|Q)rank= λT

∑q∈Q

fT (qi ,D) +

λO∑q∈Q

fO(qi , qi+1,D) +

λU∑q∈Q

fU(qi , qi+1,D)

Separate MLMs for bigrams and unigrams give FSDM the flexibility toadjust the document scoring depending on the query type

MLM is a special case of FSDM, when λT = 1, λO = 0, λU = 0

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 34: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

FSDM ranking function

FSDM incorporates document structure and term dependencies with thefollowing ranking function:

PΛ(D|Q)rank= λT

∑q∈Q

fT (qi ,D) +

λO∑q∈Q

fO(qi , qi+1,D) +

λU∑q∈Q

fU(qi , qi+1,D)

Separate MLMs for bigrams and unigrams give FSDM the flexibility toadjust the document scoring depending on the query type

MLM is a special case of FSDM, when λT = 1, λO = 0, λU = 0

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 35: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

FSDM ranking function

FSDM incorporates document structure and term dependencies with thefollowing ranking function:

PΛ(D|Q)rank= λT

∑q∈Q

fT (qi ,D) +

λO∑q∈Q

fO(qi , qi+1,D) +

λU∑q∈Q

fU(qi , qi+1,D)

Separate MLMs for bigrams and unigrams give FSDM the flexibility toadjust the document scoring depending on the query type

MLM is a special case of FSDM, when λT = 1, λO = 0, λU = 0

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 36: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

FSDM ranking function

Potential function for unigrams in case of FSDM:

fT (qi ,D) = log∑j

wTj P(qi |θjD) = log

∑j

wTj

tfqi ,D j + µjcf jqi|Cj |

|D j |+ µj

Example

apollo astronauts

category

who walked on the moon

attribute

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 37: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

FSDM ranking function

Potential function for unigrams in case of FSDM:

fT (qi ,D) = log∑j

wTj P(qi |θjD) = log

∑j

wTj

tfqi ,D j + µjcf jqi|Cj |

|D j |+ µj

Example

apollo astronautscategory

who walked on the moon

attribute

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 38: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

FSDM ranking function

Potential function for unigrams in case of FSDM:

fT (qi ,D) = log∑j

wTj P(qi |θjD) = log

∑j

wTj

tfqi ,D j + µjcf jqi|Cj |

|D j |+ µj

Example

apollo astronautscategory

who walked on the moonattribute

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 39: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Experiments

I DBPedia 3.7 as a knowledge graph

I Queries from Balog and Neumayer. A Test Collection for EntitySearch in DBpedia, SIGIR’13.

Query set Amount Query types [Pound et al., 2010]

SemSearch ES 130 EntityListSearch 115 TypeINEX-LD 100 Entity, Type, Attribute, RelationQALD-2 140 Entity, Type, Attribute, Relation

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 40: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Results

Query set Method MAP P@10 P@20 b-pref

SemSearch ESMLM-CA 0.320 0.250 0.179 0.674SDM-CA 0.254∗ 0.202∗ 0.149∗ 0.671FSDM 0.386∗† 0.286∗† 0.204∗† 0.750∗†

ListSearchMLM-CA 0.190 0.252 0.192 0.428SDM-CA 0.197 0.252 0.202 0.471∗

FSDM 0.203 0.256 0.203 0.466∗

INEX-LDMLM-CA 0.102 0.238 0.190 0.318SDM-CA 0.117∗ 0.258 0.199 0.335FSDM 0.111∗ 0.263∗ 0.215∗† 0.341∗

QALD-2MLM-CA 0.152 0.103 0.084 0.373SDM-CA 0.184 0.106 0.090 0.465∗

FSDM 0.195∗ 0.136∗† 0.111∗ 0.466∗

All queriesMLM-CA 0.196 0.206 0.157 0.455SDM-CA 0.192 0.198 0.155 0.495∗

FSDM 0.231∗† 0.231∗† 0.179∗† 0.517∗†

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 41: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

FSDM limitation

In FSDM field weights are the same for all query concepts of the sametype.

Examplecapitals in Europe which were host cities of summer Olympic games

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 42: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Parametric extension of FSDM

wTqi ,j =

∑k

αUj,kφk(qi , j)

I φk(qi , j) is the the k-th feature value for unigram qi in field j .

I αUj,k are feature weights that we learn.

∑j

wTqi ,j = 1,wT

qi ,j ≥ 0, αUj,k ≥ 0, 0 ≤ φk(qi , j) ≤ 1

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 43: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Parametric extension of FSDM

wTqi ,j =

∑k

αUj,kφk(qi , j)

I φk(qi , j) is the the k-th feature value for unigram qi in field j .

I αUj,k are feature weights that we learn.

∑j

wTqi ,j = 1,wT

qi ,j ≥ 0, αUj,k ≥ 0, 0 ≤ φk(qi , j) ≤ 1

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 44: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Parametric extension of FSDM

wTqi ,j =

∑k

αUj,kφk(qi , j)

I φk(qi , j) is the the k-th feature value for unigram qi in field j .

I αUj,k are feature weights that we learn.

∑j

wTqi ,j = 1,wT

qi ,j ≥ 0, αUj,k ≥ 0, 0 ≤ φk(qi , j) ≤ 1

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 45: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Parametric extension of FSDM

wTqi ,j =

∑k

αUj,kφk(qi , j)

I φk(qi , j) is the the k-th feature value for unigram qi in field j .

I αUj,k are feature weights that we learn.

∑j

wTqi ,j = 1,wT

qi ,j ≥ 0, αUj,k ≥ 0, 0 ≤ φk(qi , j) ≤ 1

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 46: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Features

Source Feature Description CT

Collectionstatistics

FP(κ, j) Posterior probability P(Ej |w). UG BG

TS(κ, j) Top SDM score on j-th field whenκ is used as a query.

BG

StanfordPOSTagger

NNP(κ) Is concept κ a proper noun? UG

NNS(κ) Is κ a plural non-proper noun? UG BG

JJS(κ) Is κ a superlative adjective? UG

StanfordParser

NPP(κ) Is κ part of a noun phrase? BG

NNO(κ) Is κ the only singular non-propernoun in a noun phrase?

UG

INT Intercept feature (= 1). UG BG

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 47: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Features

Source Feature Description CT

Collectionstatistics

FP(κ, j) Posterior probability P(Ej |w). UG BG

TS(κ, j) Top SDM score on j-th field whenκ is used as a query.

BG

StanfordPOSTagger

NNP(κ) Is concept κ a proper noun? UG

NNS(κ) Is κ a plural non-proper noun? UG BG

JJS(κ) Is κ a superlative adjective? UG

StanfordParser

NPP(κ) Is κ part of a noun phrase? BG

NNO(κ) Is κ the only singular non-propernoun in a noun phrase?

UG

INT Intercept feature (= 1). UG BG

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 48: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Learning-to-Rank Entities[Dali and Fortuna, WWW’11]

I Variety of features:I Popularity and importance of Wikipedia page: # of accesses from

logs, # of edits, page lengthI RDF features: # of triples E is subject/object/subject and object is

a literal, # of categories Wikipedia page for E belongs to, size of thebiggest/smallest/median category

I HITS scores and Pagerank of Wikipedia page and E in the RDFgraph

I # of hits from search engine API for the top 5 keywords from theabstract of Wikipedia page for E

I Count of entity name in Google N-grams

I RankSVM learning-to-rank method

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 49: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Evaluation

I Initial set of entities obtained using SPARQL queries

I 14 example queries for DBpedia and 27 example queries for Yago

I Example queries: “Which athlete was born in Philadelphia?”, “Listof Schalke 04 players”, “Which countries have French as an officiallanguage?”, “Which objects are heavier that the Iosif Stalin tank?”

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 50: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Feature Importance

I Features approximating the importance,hub and authority scores, PageRank ofWikipedia page are effective

I PageRank and HITS scores on RDF graphare not effective (outperformed by simplerRDF features)

I Google N-grams is effective proxy for entitypopularity, cheaper than search engine API

I Feature combinations improve bothrobustness and accuracy of ranking

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 51: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Transfer Learning

I Ranking model was trained onDBpedia questions and applied toYago questions

I Only feature set A (all features) resultsin robust ranking model transfer

I In general, the ranking models fordifferent knowledge graphs arenon-transferable, unless they have beenlearned on large number of features

I The biggest inconsistencies occur onthe models trained on graph basedfeatures → knowledge graphs preserveparticularities reflecting their designerdecisions

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 52: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Latent Dimensional Representation[Zhiltsov and Agichtein, CIKM’13]

I Compact representation of entities in low dimensional space by usinga modified algorithm for tensor factorization

I Entities and entity-query pairs are represented with term-based andstructural features

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 53: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Knowledge Graph as Tensor

I For a knowledge graph with n distinct entities and m distinctpredicates, we construct a tensor X of size n × n ×m, whereXijk = 1, if there is k-th predicate between i-th entity and j-thentity, and Xijk = 0, otherwise

I Each k-th frontal tensor slice Xk is an adjacency matrix for thek-the predicate, which is sparse

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 54: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

RESCAL Tensor Factorization[Nikel, Tresp, et al., WWW’12]

I Given r is the number of latent factors, we factorize each Xk intothe matrix product:

Xk = ARkAT , k = 1,m,

where A is a dense n × r matrix, a matrix of latent embeddings forentities, and Rk is an r × r matrix of latent factors

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 55: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Retrieval Method

1. Retrieve initial set of entities using MLM

2. Re-rank the entities using Gradient Boosted Regression Tree (GBRT)

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 56: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Features

# FeatureTerm-based features1 Query length2 Query clarity3 Uniformly weighted MLM score4 Bigram relevance score for the ”name” field5 Bigram relevance score for the ”attributes” field6 Bigram relevance score for the ”outgoing links” fieldStructural features7 Top-3 entity cosine similarity, cos(e, etop)8 Top-3 entity Euclidean distance, ‖e− etop‖9 Top-3 entity heat kernel, e−

‖e−etop‖2

σ

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 57: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Results

FeaturesPerformance

NDCG MAP P@10Term-based baseline 0.382 0.265 0.539

All features 0.401 (+ 5.0%)∗ 0.276 (+ 4.2%) 0.561 (+ 4.1%)∗

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 58: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Ranking KG Entities using Top Documents[Schuhmacher, Dietz et al., CIKM’15]

I Motivation: to address free-text web-style queries corresponding tocomplex information needs that cannot be satisfied by an entity or alist of homogeneous entities with the same type (e.g. “ArgentineBritish relations”)

I Method:

1. Retrieve documents for a query using entity-aware (e.g. EQFE) orstandard retrieval model (e.g. SDM)

2. Link entity mentions in top-k documents to entities in a KB (e.g.using KBBridge) or use existing annotations of TREC collections(e.g. FACC1 for ClueWeb09/ClueWeb12)

3. Rank linked entities using a learning-to-rank framework combiningfeatures based on document collection and structured KBs

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 59: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Approach

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 60: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Features and rankers

I Features:

• Mention: # of entity occurrences in top retrieved documentsweighted entity IDF (MenFrqIdf);

• Query-Mention: normalized Levenshtein distance between the queryand the mention (SED); similarity between aggregate representationsof queries and mention context using GloVe (Glo) and JoBimText(Jo) distributional thesauri;

• Query-Entity: (a) compare the set of linked query entities with topdocument entities – whether document entity is present in a query(QEnt); whether there is a path between between document andquery entity (QEntEntSim) (b) retrieval with query keywordscombined with text associated with document entities in KB –entities returned by Boolean model over Wikipedia articles(WikiBoolean); SDM retrieval score of top 1000 Wikipedia articles(WikiSDM)

• Entity-Entity: whether there is a path between two entities inDBpedia KG

I Rankers: pairwise (SVM-rank with linear kernel and linear kernelcombined with semantic smoothing kernel) and listwise (coordinateascent using RankLib)

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 61: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Results

(a) Robust04 (b) ClueWeb12

I Authoritativeness marginally correlates with relevance (entitiesranked high by PageRank are very general)

I Best results are obtained when ranking using SDM (supported byINEX results) and normalized mention frequencies

I RankLib performs better than SVM-rank with or without semantickernel

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 62: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Feature importance

I Context query mentionfeatures (prefix C ) performworse than their no-contextcounterparts (prefix M )

I Context features based onedit distance anddistributional similarity arenot effective

I DBpedia-based featureshave positive butinsignificant influence onthe overall performance,while Wikipedia-basedfeatures show strong andsignificant influence

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 63: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Takeaway messages

I Use dynamic entity representations built from different sources (notonly KB)

I Use retrieval models that account for different query concept types(FSDM and PFSDM) rather than standard fielded documentretrieval models (BM25F and MLM) to obtain candidate entities

I Expand candidate entities by following KG links and usingtop-retrieved documents

I Re-rank candidate entities by using a variety of features includinglatent dimensional entity representations

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 64: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

Thank you!

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 65: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

References (1)

Entity Representation Methods:

• Neumayer, Balog et al. On the Modeling of Entities for Ad-hocEntity Search in the Web of Data, ECIR’12

• Neumayer, Balog et al. When Simple is (more than) Good Enough:Effective Semantic Search with (almost) no Semantics, ECIR’12

• Zhiltsov, Kotov et al. Fielded Sequential Dependence Model forAd-hoc Entity Retrieval in the Web of Data, SIGIR’15

• Zhiltsov and Agichtein. Improving Entity Search over Linked Databy Modeling Latent Semantics, CIKM’13

• Graus, Tsagkias et al. Dynamic Collective Entity Representations forEntity Ranking, WSDM’16

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR

Page 66: Entity Representation and Retrievalwebpages.eng.wayne.edu/~fn6418/slides/kotov-ictir16-tutorial.pdf · Entity Retrieval from Knowledge Graph(s) (ERKG) (2) I Assumes keyword queries

References (2)

Entity Retrieval:

• Dali and Fortuna. Learning to Rank for Semantic Search, WWW’11

• Tonon, Demartini et al. Combining Inverted Indices and StructuredSearch for Ad-hoc Object Retrieval, SIGIR’12

• Zhiltsov, Kotov et al. Fielded Sequential Dependence Model forAd-hoc Entity Retrieval in the Web of Data, SIGIR’15

• Nikolaev, Kotov et al. Parameterized Fielded Term DependenceModels for Ad-hoc Entity Retrieval from Knowledge Graph, SIGIR’16

• Schuhmacher, Dietz et al. Ranking Entities for Web Queries throughText and Knowledge, CIKM’15

ICTIR 2016 Tutorial on Utilizing KGs in Text-centric IR