g lobal and lo cal w ikification (glow) in tac kbp entity linking shared task 2011

1
Global and Local Wikification (GLOW) in TAC KBP Entity Linking Shared Task 2011 Lev Ratinov, Dan Roth This research is supported by the Defense Advanced Research Projects Agency (DARPA) Machine Reading Program under Air Force Research Laboratory (AFRL) prime contract no. FA8750-09- C-0181 and by and by the Army Research Laboratory (ARL) under agreement W911NF-09-2-0053. Any opinions, findings, and conclusion or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the view of the DARPA, AFRL, ARL or the US government. GLOW Problem Formulation: bipartite matching Γ * is a solution to the problem, a set of mention- title pairs (m,t). Evaluate the local matching quality using Φ(m,t). Evaluate the global structure based on (a) pair- wise coherence scores Ψ(t i ,t j ) (b) an approximate solution Γ’.Γ’ allows disambiguating the mentions independently while taking into account the global structure. Visit our demo: http://cogcomp.cs.illinois.edu/demo/wikify/ Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997.. * ID=2012 * “Ford” * “The Ford Presidential Library is named after President Gerald Ford” 1) MENTION IDENTIFICAT ION “The [Ford] m1 Presidential Library is named after President [Gerald Ford] m2 (m1, http://en.wikipedia.org/wiki/Ford_Motor_Company , 0.1, -0.1) (m2, http://en.wikipedia.org/wiki/President_Gerald_F ord, 0.2, 0.7) Michael Jordan (basketball) Michael Jackson (singer) Gerald Ford (president) KBP TAC Knowledgebase 3) GLOW OUTPUT RECONCILIATION TAC QUERY Gerald Ford (president) QUERY MAPPING Is a Macintosh font Has a distinctive N Used in Mac OS 7.6 • …. It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”. 2) GLOW DISAMBIGUATION (ID=2012, Form= “Ford”, Text=“The Ford Presidential Library is named after President Gerald Ford”) TAC QUERY Michael Jordan (basketball) Michael Jackson (singer) Gerald Ford (president) KBP TAC Knowledgebase Vision: aggregate information about an entity from multiple documents Task methodology: map queries to a TAC entity database (ID=2017, Form= “Michael”, Text=“This video shows Michael Jackson performing Billie Jean”) TAC QUERY Our approach: use the GLOW “disambiguation to Wikipedia” system Local and Global Algorithms for Disambiguation to Wikipedia L. Ratinov and D. Downey and M. Anderson and D. Roth (ACL 2011) 2) GLOW DISAMBIGUATION We have explored two strategies: •Simple Query Identification (SIQI): mark the expressions in the text which match the query form exactly. •Named Entity Query Identification (NEQI): identify the named entities in the text matching the query form approximately, normalize the spelling using Wikipedia (this poster illustrates NEQI). This is similar to query expansion. 1) MENTION IDENTIFICATION Experiments, Results (TAC 2011 Test Data) Conclusions: 1)It is possible to apply a “disambiguation to Wikipedia” system directly to the TAC KBP Entity Linking task. We did not train our system on TAC data. 2)NEQI mention identification gains 4 B 3 F1 points over SIQI. 3)All reasonable output reconciliation policies have performed comparably. 3) GLOW OUTPUT RECONCILIATION Given a set of mentions linked to the query, we need to provide a single Wikipedia title. However each mention can be assigned a different title. We are using the ranker scores and the linker scores to make the decision. The “with linker” strategy discards mentions assigned negative linker score (which means the objective function increases if we map these mentions to NULL). The “no linker” strategy uses all mentions. The decision on the single-best matching title is based on ranker scores. The “Max” strategy uses a single mention with the highest ranker score. The “Sum” strategy, sums the ranker scores of all the mentions assigned to the same title. In the figure on the left, we illustrate the 4 resulting strategies along with the mentions they use, and with the resulting ranker scores for each title. The hollow circles indicate the discarded mentions, while the full circles indicate mentions that contribute to final title ranking scores.

Upload: ghalib

Post on 15-Jan-2016

22 views

Category:

Documents


0 download

DESCRIPTION

1) MENTION IDENTIFICATION. G lobal and Lo cal W ikification (GLOW) in TAC KBP Entity Linking Shared Task 2011. “ The [Ford] m1 Presidential Library is named after President [Gerald Ford] m2 ”. TAC QUERY. * ID=2012 * “ Ford ” - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: G lobal and  Lo cal   W ikification (GLOW) in TAC KBP Entity Linking Shared Task  2011

Global and Local Wikification (GLOW) in TAC KBP Entity Linking Shared Task 2011

Lev Ratinov, Dan Roth

This research is supported by the Defense Advanced Research Projects Agency (DARPA) Machine Reading Program under Air Force Research Laboratory (AFRL) prime contract no. FA8750-09-C-0181 and by and by the Army Research Laboratory (ARL) under agreement W911NF-09-2-0053. Any opinions, findings, and conclusion or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the view of the DARPA, AFRL, ARL or the US government.

GLOW Problem Formulation: bipartite matching

Γ* is a solution to the problem, a set of mention-title pairs (m,t). Evaluate the local matching quality using Φ(m,t). Evaluate the global structure based on (a) pair-wise coherence scores Ψ(ti,tj) (b) an approximate solution Γ’.Γ’ allows disambiguating the mentions independently while taking into account the global structure.

GLOW Problem Formulation: bipartite matching

Γ* is a solution to the problem, a set of mention-title pairs (m,t). Evaluate the local matching quality using Φ(m,t). Evaluate the global structure based on (a) pair-wise coherence scores Ψ(ti,tj) (b) an approximate solution Γ’.Γ’ allows disambiguating the mentions independently while taking into account the global structure.

Visit our demo: http://cogcomp.cs.illinois.edu/demo/wikify/

Chicago was used by default for Mac menus through MacOS 7.6, and OS 8 was released mid-1997..

* ID=2012* “Ford”* “The Ford Presidential Library is named after President Gerald Ford”

1) MENTION IDENTIFICATION “The [Ford]m1 Presidential

Library is named after President [Gerald Ford]m2”

(m1, http://en.wikipedia.org/wiki/Ford_Motor_Company, 0.1, -0.1)(m2, http://en.wikipedia.org/wiki/President_Gerald_Ford, 0.2, 0.7)

…Michael Jordan (basketball)

Michael Jackson (singer)Gerald Ford (president)

…Michael Jordan (basketball)

Michael Jackson (singer)Gerald Ford (president)

KBP TAC Knowledgebase

3) GLOW OUTPUTRECONCILIATION

TAC QUERY

Gerald Ford (president)

QUERY MAPPING

• Is a Macintosh font• Has a distinctive N• Used in Mac OS 7.6• ….

• Is a Macintosh font• Has a distinctive N• Used in Mac OS 7.6• ….

It’s a version of Chicago – the standard classic Macintosh menu font, with that distinctive thick diagonal in the ”N”.

2) GLOW DISAMBIGUATION

(ID=2012, Form= “Ford”, Text=“The Ford Presidential Library is named after President Gerald Ford”)

TAC QUERY

…Michael Jordan (basketball)

Michael Jackson (singer)Gerald Ford (president)

…Michael Jordan (basketball)

Michael Jackson (singer)Gerald Ford (president)

KBP TAC Knowledgebase

Vision: aggregate information about an entity from multiple documentsVision: aggregate information about an entity from multiple documents

Task methodology: map queries to a TAC entity database Task methodology: map queries to a TAC entity database

(ID=2017, Form= “Michael”, Text=“This video shows Michael Jackson performing Billie Jean”)

TAC QUERY

Our approach: use the GLOW “disambiguation to Wikipedia” system Local and Global Algorithms for Disambiguation to Wikipedia

L. Ratinov and D. Downey and M. Anderson and D. Roth (ACL 2011)

Our approach: use the GLOW “disambiguation to Wikipedia” system Local and Global Algorithms for Disambiguation to Wikipedia

L. Ratinov and D. Downey and M. Anderson and D. Roth (ACL 2011)

2) GLOW DISAMBIGUATION

We have explored two strategies: •Simple Query Identification (SIQI): mark the expressions in the text which match the query form exactly.•Named Entity Query Identification (NEQI): identify the named entities in the text matching the query form approximately, normalize the spelling using Wikipedia (this poster illustrates NEQI). This is similar to query expansion.

We have explored two strategies: •Simple Query Identification (SIQI): mark the expressions in the text which match the query form exactly.•Named Entity Query Identification (NEQI): identify the named entities in the text matching the query form approximately, normalize the spelling using Wikipedia (this poster illustrates NEQI). This is similar to query expansion.

1) MENTION IDENTIFICATION

Experiments, Results (TAC 2011 Test Data)

Conclusions:1)It is possible to apply a “disambiguation to Wikipedia” system directly to the TAC KBP Entity Linking task. We did not train our system on TAC data.2)NEQI mention identification gains 4 B3 F1 points over SIQI.3)All reasonable output reconciliation policies have performed comparably.

Experiments, Results (TAC 2011 Test Data)

Conclusions:1)It is possible to apply a “disambiguation to Wikipedia” system directly to the TAC KBP Entity Linking task. We did not train our system on TAC data.2)NEQI mention identification gains 4 B3 F1 points over SIQI.3)All reasonable output reconciliation policies have performed comparably.

3) GLOW OUTPUT RECONCILIATION

Given a set of mentions linked to the query, we need to provide a single Wikipedia title. However each mention can be assigned a different title.We are using the ranker scores and the linker scores to make the decision. The “with linker” strategy discards mentions assigned negative linker score (which means the objective function increases if we map these mentions to NULL). The “no linker” strategy uses all mentions. The decision on the single-best matching title is based on ranker scores. The “Max” strategy uses a single mention with the highest ranker score. The “Sum” strategy, sums the ranker scores of all the mentions assigned to the same title.In the figure on the left, we illustrate the 4 resulting strategies along with the mentions they use, and with the resulting ranker scores for each title. The hollow circles indicate the discarded mentions, while the full circles indicate mentions that contribute to final title ranking scores.