entity typing and event extraction

Entity Typing and Event Extraction

Marieke van Erphttp://mariekevanerp.com

http://mariekevanerp.com

VU WP3 team

Isa Maks Antske Fokkens Marieke van Erp Piek Vossen

Entity typing

What is entity typing?

• Entity typing is the task of classifying an entity mention

• An entity mention is a recognised name in a text that refers to a real world person, location, organisation or other interesting ‘thing’

What is the added value of entity typing?

• It allows you to query for fine-grained entity types: give me all electricians in the dataset, give me all historic buildings

• Entity typing often includes linking an entity to background knowledge

• The background knowledge provides additional filters: give me all politicians born after 1900 in the dataset

• Caveat: the background knowledge is not complete

New synonym/concept lists are easy to plug in

Brouwers: concept-100350 (ponstypiste) isRelatedTo class-Schrijfkunst

concept-100343 (tachygraaf) isRelatedTo class-Schrijfkunst

concept-100313 (schrijver) isRelatedTo class-Schrijfkunst .

Brouwers:

concept-100350 (ponstypiste) isRelatedTo class-Schrijfkunst

concept-100343 (tachygraaf) isRelatedTo class-Schrijfkunst

concept-100313 (schrijver) isRelatedTo class-Schrijfkunst .

Named Entity Recognition & Linking

• We are creating links between HISCO and Brouwers

• We are building on entity and concept linkers that can recognise concepts from HISCO and Brouwers in texts

• We are developing a new general purpose entity linker that allows for use of datasets other than DBpedia and is less sensitive to general entity popularity

• Discovering more about Dark and NIL entities is also ongoing work (cf. Van Erp & Vossen (2016) Entity Typing using Distributional Semantics and DBpedia. To appear in: Proceedings of the 4th NLP&DBpedia workshop. Kobe, Japan 18 October 2016)

Event Extraction

Event Extraction

• Event Extraction is the task of recognising and classifying mentions of ‘things that happen’ in text

• Events are multifaceted: they take place at a certain time and place and have participants involved

• By recognising participants, times and places, we can generate event descriptions and compare events

From words to concepts

• Linking terms to synonyms to obtain a higher level of abstraction

• Word-sense disambiguation + WordNet + Multilingual Central Repository + Framenet + PropBank

• Stop, quit, leave, relinquish, bow out -> all linked to the concept wn:leave_office

Why link to WordNet/ConceptNet/etc?

• It allows you to query for types rather than instances: give me all lawsuits in the dataset

• In the context of CLARIAH, we are converting various diachronous lexicons to Linked Data

• integrate resources

• tag interesting concepts in text

• query expansion

Semantic Role Labelling

• Detecting the agent, patient, recipient and theme of a sentence

• Mary sold the book to John

• Agent: Mary

• Recipient: John

• Theme: the book

Event12buy/sell fn:Seller

fn:Commerce_money_transfer

fn:Goods fn:Money

fn:Buyerdbp:Porsche_family dbp:QatarHolding

?Entity23 10% stake

type

Qatar Holding sells 10% stake in Porsche to founding families

Porsche family buys back 10pc stake from Qatar

http://english.alarabiya.net http://www.telegraph.co.uk

2013-06-17

sem:hasTime

2013-06-17

http://english.alarabiya.net

Event abstractions

• Enable searches such as: Give me all lawsuits in which a politician was involved between 1990 and 2000.

• Current developments: expand resources to the historic domain, devise new crystallisation strategies for aggregating event information

Find out more

• All modules and evaluations are described in: http://kyoto.let.vu.nl/newsreader_deliverables/NWR-D4-2-3.pdf (158 pages!)

• Selection to be adapted within CLARIAH: https://github.com/CLARIAH/wp3-semantic-parsing-Dutch

• New developments: http://www.clariah.nl & https://github.com/clariah

http://kyoto.let.vu.nl/newsreader_deliverables/NWR-D4-2-3.pdf

https://github.com/CLARIAH/wp3-semantic-parsing-Dutch

http://www.clariah.nl

https://github.com/clariah

Discussion

• It’s research software (no fancy interface)

• Currently not adapted to deal with old spelling variants/OCR/etc

• NLP isn’t perfect (but humans don’t always agree either!)

• What would it take for you to start using such tools?

• What types of analyses are most interesting to the community?

• What use cases are most useful to the community at this point in time?

entity typing and event extraction

Technology