enhancing entity linking by combining ner models

11
Julien Plu, Giuseppe Rizzo, Raphaël Troncy [email protected] @julienplu Enhancing Entity Linking by Combining NER Models

Upload: julien-plu

Post on 18-Feb-2017

438 views

Category:

Software


1 download

TRANSCRIPT

Julien Plu, Giuseppe Rizzo, Raphaël [email protected]

@julienplu

Enhancing Entity Linking by Combining NER Models

ADEL in a nutshell

§ ADaptive Entity Linking Framework: http://multimediasemantics.github.io/adel/

§ OKE2015 Challenge winner

§ ADEL (in 2015):ØUse Stanford POS and Stanford NER for extracting named

entities

Ø Train and use one single CRF model via Stanford NER

ØNIL entities were all distinct

2016/05/31 - OKE Challenge at ESWC2016 – Heraklion, Crete - 2

What’s new in ADEL (2016)?

§ Use a generic NIF wrapper over NLP Core systems (Stanford CoreNLP, OpenNLP, etc.) as an external API

§ Use an algorithm to combine multiple CRFs model

§ Cluster NIL entities

§ Propose a generic index format and develop a new backend (Elasticsearch + Couchbase)

§ Filter candidate links based on types

2016/05/31 - OKE Challenge at ESWC2016 – Heraklion, Crete - 3

Different Approaches

E2E approaches:A dictionary of mentions and links is built from a

referent KB. A text is split in n-grams that are used to look up candidate links from the dictionary. A

selection function is used to pick up the best match

Linguistic-based approaches:

A text is parsed by a NER classifier. Entity mentions

are used to look up resources in a referent KB. A ranking function is used to select the best match

ADEL is a combination of both to make a hybrid approach

2016/05/31 - OKE Challenge at ESWC2016 – Heraklion, Crete - 4

ADEL from 30,000 foots

2016/05/31 - OKE Challenge at ESWC2016 – Heraklion, Crete

ADEL

Entity Extraction

Entity Linking Index

- 5

§ POS Tagger:Ø bidirectional

CMM (left to right and right to left)

§ NER Combiner:Ø Use a combination of CRF with Gibbs sampling (Monte Carlo as graph inference method)

models. A simple CRF model could be:

PER PER PERO OOO

X X X X XX XXXX

X set of features for the current word: word capitalized, previous word is “de”, next word is aNNP, … Suppose P(PER | X, PER, O, LOC) = P(PER | X, neighbors(PER)) then X with PER is a CRF

Jimmy Page , knowing the professionalism of John Paul Jones

Entity Extraction: Extractors Module

PER PERO

2016/05/31 - OKE Challenge at ESWC2016 – Heraklion, Crete - 6

Entity Extraction: Overlap Resolution

§ Detect overlaps among boundaries of entities coming from the extractors

§ Different heuristics can be applied:Ø Merge: (“United States” and “States of America” => “United States of

America”) default behavior

Ø Simple Substring: (“Florence” and “Florence May Harding” => ”Florence” and “May Harding”)

Ø Smart Substring: (”Giants of New York” and “New York” => “Giants” and “New York”)

2016/05/31 - OKE Challenge at ESWC2016 – Heraklion, Crete - 7

Index: Indexing

§ Use DBpedia and Wikipedia as knowledge bases

§ Integrate external data such as PageRank scores from Hasso Platner Institute

§ Backend sytem with Elasticsearch and Couchbase

§ Turn DBpedia and Wikipedia into a CSV-based generic format

2016/05/31 - OKE Challenge at ESWC2016 – Heraklion, Crete - 8

Entity Linking: Linking tasks

§ Generate candidate links for all extracted mentions:Ø If any, they go to the linking method

Ø If not, they are linked to NIL via NIL Clustering module

§ Linking method:Ø Filter out candidates that have different

types than the one given by NER

Ø ADEL linear formula:𝑟 𝑙 = 𝑎. 𝐿 𝑚, 𝑡𝑖𝑡𝑙𝑒 + 𝑏. max 𝐿 𝑚, 𝑅 + 𝑐. max 𝐿 𝑚, 𝐷 . 𝑃𝑅(𝑙)

r(l): the score of the candidate lL: the Levenshtein distancem:the extracted mentiontitle: the title of the candidate lR: the set of redirect pages associated to the candidate lD: the set of disambiguation pages associated to the candidate lPR: Pagerank associated to the candidate l

a,band c are weights following the properties:a>b>c and a+b+c=1

2016/05/31 - OKE Challenge at ESWC2016 – Heraklion, Crete - 9

ADEL 2015 vs ADEL 2016 in OKE

§ ADEL (2015 version) over OKE2015 test set (after adjudication)

§ ADEL (2016 version) over OKE2015 test set

§ ADEL (2016 version) over OKE2016 training set (4-fold validation)

2016/05/31 - OKE Challenge at ESWC2016 – Heraklion, Crete - 10

Precision Recall F-measure

extraction 78.2 65.4 71.2

recognition 65.8 54.8 59.8

linking 49.4 46.6 48

Precision Recall F-measure

extraction 85.1 89.7 87.3

recognition 75.3 59 66.2

linking 85.4 42.7 57

Precision Recall F-measure

extraction 81 88.7 84.7

recognition 78.1 85.4 81.6

linking 57.4 55.7 56.5

Questions?

Thank you for listening!

2016/05/31 - OKE Challenge at ESWC2016 – Heraklion, Crete - 11

http://multimediasemantics.github.io/adel

http://jplu.github.io

[email protected]

@julienplu

http://www.slideshare.net/julienplu