entity typing and event extraction
TRANSCRIPT
VU WP3 team
Isa Maks Antske Fokkens Marieke van Erp Piek Vossen
Entity typing
What is entity typing?
• Entity typing is the task of classifying an entity mention
• An entity mention is a recognised name in a text that refers to a real world person, location, organisation or other interesting ‘thing’
What is the added value of entity typing?
• It allows you to query for fine-grained entity types: give me all electricians in the dataset, give me all historic buildings
• Entity typing often includes linking an entity to background knowledge
• The background knowledge provides additional filters: give me all politicians born after 1900 in the dataset
• Caveat: the background knowledge is not complete
New synonym/concept lists are easy to plug in
New synonym/concept lists are easy to plug in
Brouwers: concept-100350 (ponstypiste) isRelatedTo class-Schrijfkunst
concept-100343 (tachygraaf) isRelatedTo class-Schrijfkunst
concept-100313 (schrijver) isRelatedTo class-Schrijfkunst .
Brouwers: concept-100350 (ponstypiste) isRelatedTo class-Schrijfkunst
concept-100343 (tachygraaf) isRelatedTo class-Schrijfkunst
concept-100313 (schrijver) isRelatedTo class-Schrijfkunst .
Brouwers:
concept-100350 (ponstypiste) isRelatedTo class-Schrijfkunst
concept-100343 (tachygraaf) isRelatedTo class-Schrijfkunst
concept-100313 (schrijver) isRelatedTo class-Schrijfkunst .
Named Entity Recognition & Linking
• We are creating links between HISCO and Brouwers
• We are building on entity and concept linkers that can recognise concepts from HISCO and Brouwers in texts
• We are developing a new general purpose entity linker that allows for use of datasets other than DBpedia and is less sensitive to general entity popularity
• Discovering more about Dark and NIL entities is also ongoing work (cf. Van Erp & Vossen (2016) Entity Typing using Distributional Semantics and DBpedia. To appear in: Proceedings of the 4th NLP&DBpedia workshop. Kobe, Japan 18 October 2016)
Event Extraction
Event Extraction
• Event Extraction is the task of recognising and classifying mentions of ‘things that happen’ in text
• Events are multifaceted: they take place at a certain time and place and have participants involved
• By recognising participants, times and places, we can generate event descriptions and compare events
From words to concepts
• Linking terms to synonyms to obtain a higher level of abstraction
• Word-sense disambiguation + WordNet + Multilingual Central Repository + Framenet + PropBank
• Stop, quit, leave, relinquish, bow out -> all linked to the concept wn:leave_office
Why link to WordNet/ConceptNet/etc?
• It allows you to query for types rather than instances: give me all lawsuits in the dataset
• In the context of CLARIAH, we are converting various diachronous lexicons to Linked Data
• integrate resources
• tag interesting concepts in text
• query expansion
Semantic Role Labelling
• Detecting the agent, patient, recipient and theme of a sentence
• Mary sold the book to John
• Agent: Mary
• Recipient: John
• Theme: the book
Event12buy/sell fn:Seller
fn:Commerce_money_transfer
fn:Goods fn:Money
fn:Buyerdbp:Porsche_family dbp:QatarHolding
?Entity23 10% stake
type
Qatar Holding sells 10% stake in Porsche to founding families
Porsche family buys back 10pc stake from Qatar
http://english.alarabiya.net http://www.telegraph.co.uk
2013-06-17
sem:hasTime
2013-06-17
Event abstractions
• Enable searches such as: Give me all lawsuits in which a politician was involved between 1990 and 2000.
• Current developments: expand resources to the historic domain, devise new crystallisation strategies for aggregating event information
Find out more
• All modules and evaluations are described in: http://kyoto.let.vu.nl/newsreader_deliverables/NWR-D4-2-3.pdf (158 pages!)
• Selection to be adapted within CLARIAH: https://github.com/CLARIAH/wp3-semantic-parsing-Dutch
• New developments: http://www.clariah.nl & https://github.com/clariah
Discussion
• It’s research software (no fancy interface)
• Currently not adapted to deal with old spelling variants/OCR/etc
• NLP isn’t perfect (but humans don’t always agree either!)
• What would it take for you to start using such tools?
• What types of analyses are most interesting to the community?
• What use cases are most useful to the community at this point in time?