named entity recognition and linking...2018/01/24  · definitions •entity: something that exists...

24
Named Entity Recognition and Linking Artūrs Znotiņš, [email protected] Supervisor: Guntis Bārzdiņš, Dr.sc.comp.

Upload: others

Post on 01-Jan-2021

15 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Named Entity Recognition and Linking...2018/01/24  · Definitions •Entity: something that exists by itself •Named entity (NE):entity with specific name •Mention: phrase that

NamedEntityRecognitionandLinking

ArtūrsZnotiņš,[email protected]

Supervisor:Guntis Bārzdiņš,Dr.sc.comp.

Page 2: Named Entity Recognition and Linking...2018/01/24  · Definitions •Entity: something that exists by itself •Named entity (NE):entity with specific name •Mention: phrase that

TheProblem

• Largeamountsofunstructureddata– naturallanguage(WWW,books,newspapers,radio)• Alotofambiguity- contextisveryimportant• Humansaregoodatsemanticdisambiguation:• Whatentitiesdoestextreferto?• Whatfactsdoestextdescribe?• Whatisthemeaningofspecificword?

• Howtodothisautomatically?

2

Page 3: Named Entity Recognition and Linking...2018/01/24  · Definitions •Entity: something that exists by itself •Named entity (NE):entity with specific name •Mention: phrase that

Motivation

• UtilizeexistingunstructurednaturallanguageresourcesHiddenknowledge

• Usecases:• Informationextraction(IE)• Textclustering• Mediamonitoring• Dialogsystems• Questionanswering• Machinetranslation

3

Cikos atiet vilciens uz Daugavpili?

Page 4: Named Entity Recognition and Linking...2018/01/24  · Definitions •Entity: something that exists by itself •Named entity (NE):entity with specific name •Mention: phrase that

Definitions

• Entity:somethingthatexistsbyitself• Namedentity(NE): entitywithspecificname• Mention:phrasethatreferstoentity• Knowledgebase(KB):organizedrepositoryofknowledgeconsistingofconcepts,properties,links• (Named)EntityLinking(NEL):linkingmentionsofentitieswithinatexttoKBentities• WordSenseDisambiguation(WSD):assigningmeaningstowordoccurrenceswithintext

4

Page 5: Named Entity Recognition and Linking...2018/01/24  · Definitions •Entity: something that exists by itself •Named entity (NE):entity with specific name •Mention: phrase that

KnowledgeBase:Example

5https://www.wikidata.org/wiki/Wikidata:Introduction

Page 6: Named Entity Recognition and Linking...2018/01/24  · Definitions •Entity: something that exists by itself •Named entity (NE):entity with specific name •Mention: phrase that

KnowledgeBase:Example

6https://www.wikidata.org/w/index.php?search=&search=andris+berzins

Page 7: Named Entity Recognition and Linking...2018/01/24  · Definitions •Entity: something that exists by itself •Named entity (NE):entity with specific name •Mention: phrase that

NamedEntityLinking:Example

“Arī otra figūra Daimlerlietā ir Bojāra ārštatapadomnieks,unsens eksmēra draugs noarmijas laikiem

– Armands Zeihmanis.”(notvnet.lv)

• Bojāra→GundarsBojārs,dz.1967https://lv.wikipedia.org/wiki/Gundars_Bojārs

• Otra figūra=Bojāra =eksmēra

• Bojārs↔Zeihmanis:draugs,padomnieks

7

Page 8: Named Entity Recognition and Linking...2018/01/24  · Definitions •Entity: something that exists by itself •Named entity (NE):entity with specific name •Mention: phrase that

NEL:GeneralApproach

• NamedEntityRecognition• CandidateSelection

SelectingpossiblecandidatesfromthetargetKnowledgeBase

• DisambiguationDecidingwhichcandidateisthecorrectidentitycorrespondingtothementionofaNamedEntity

8

Page 9: Named Entity Recognition and Linking...2018/01/24  · Definitions •Entity: something that exists by itself •Named entity (NE):entity with specific name •Mention: phrase that

NEL:ContextFreeApproach

• ExtractsurfaceformsfromKBorannotatedcorpus• DBpedia labels(rathersparse)• InternallinksofWikipedia

• Cleanandcatalog• Faststringmatch

+ Simple+ Highprecision- Lowrecall- Doesnotsolveambiguities

9

𝑈"# = 𝑒# ∈ 𝐸"#: 𝐶 𝑒# ≥ 𝛼×-𝐶 𝑒.

/01

.

Generatenamealternatives

Decideonwhichsurfaceformshaveambiguouslabelswhichcannotbeconsideredwithoutcontext

Page 10: Named Entity Recognition and Linking...2018/01/24  · Definitions •Entity: something that exists by itself •Named entity (NE):entity with specific name •Mention: phrase that

NEL:KnowledgeRichApproach

• Stringmatchbasedcandidateselection• Mentionattributes:

• Bag-of-name-words• Bag-of-mention-words• Bag-of-context-words

• Cosinesimilaritybetweenvectors

10

Compatibilityfunction

Entity(KB)

Subentity (documentlevel)

Mentions+ Cansolveambiguities+ KBcanbeenrichedwithvalidatedentities- Performancedependsonpre-processingcomponents

Wicketal,2013;Stoyanov etal,2012

Page 11: Named Entity Recognition and Linking...2018/01/24  · Definitions •Entity: something that exists by itself •Named entity (NE):entity with specific name •Mention: phrase that

NEL:ProcessingPipeline

11

Morphologyandsyntax

NER

Knowledgebase

Co-referenceresolution

Annotateddocument

EntityLinking

Textdocument

Page 12: Named Entity Recognition and Linking...2018/01/24  · Definitions •Entity: something that exists by itself •Named entity (NE):entity with specific name •Mention: phrase that

MorphologyandSyntax

• Lexiconbasedmorphologicalanalyzer• CMMmorphologicaltagger• LSTMtransition-basedparser• Wordembeddings• CharacterLSTMrepresentation

12

Model UD(UAS, %)

LSTM+ WE 75.1

LSTM+WE+morpho-tag 75.4

• Inflectiongeneration• Mentioncandidate

selection

Paikens etal,2013;Pretkalniņa etal,2016

Page 13: Named Entity Recognition and Linking...2018/01/24  · Definitions •Entity: something that exists by itself •Named entity (NE):entity with specific name •Mention: phrase that

NamedEntityRecognition

13

Model NER(F1, %)

BaselineCRF 83.4

LSTM-CRF 63.7

LSTM-CRF+WE 90.5

Page 14: Named Entity Recognition and Linking...2018/01/24  · Definitions •Entity: something that exists by itself •Named entity (NE):entity with specific name •Mention: phrase that

NER:LearningCurve

0% 20% 40% 60% 80% 100%

Corpus fraction used for training

50%

60%

70%

80%

90%

100%

F1-

scor

e

58.8%

66.0%

57.1%

53.9%

80.1%

62.6%

57.1%

84.5%

72.1%

60.0%

86.6%

83.4%

63.7%

90.2%

CRF

LSTM-CRF

LSTM-CRF + embeddings

14

Page 15: Named Entity Recognition and Linking...2018/01/24  · Definitions •Entity: something that exists by itself •Named entity (NE):entity with specific name •Mention: phrase that

NamedEntityrecognition

15

Bi-directionalLSTM-CRF• Pre-trainedwordembeddings• CharacterLSTMorCNNrepresentations• Jointcross-labelmodelling

CRFlayer

LSTM

Wordembeddings

Characterlevelwordvector

Lample etal,2015

Page 16: Named Entity Recognition and Linking...2018/01/24  · Definitions •Entity: something that exists by itself •Named entity (NE):entity with specific name •Mention: phrase that

Co-referenceresolution

• Mentionsthatreferstothesamerealwordentity

16

Followinginthefootstepsoftheir fatherDainis,anex-bobsleighspecialist,brothersMartins andTomassDukurs arecurrentlyrankedamongtheglobe’stopskeletonracers.Martins,worldnumberone,isthegoldmedalfavourite.He dreamsofsharingthepodiumwithhis eldersibling.

• {theirfatherDainis,anex-bobsleighspecialist}• {their,brothersMartinsandTomass Dukurs }• {Martins,Martins,worldnumberone,thegoldmedalfavourite,He,his}• {Tomass Dukurs,hiseldersibling}

Page 17: Named Entity Recognition and Linking...2018/01/24  · Definitions •Entity: something that exists by itself •Named entity (NE):entity with specific name •Mention: phrase that

Co-referenceResolution

• MentiondetectionNE,pronouns,nounphrases,etc.

• Mentionchaining– clustering• Grammaticallyrelatedmentions• Salience(gender,number,person,etc.)• Pairwiseorclusterbased

• Representativenameandtype(person,etc.)

17

Page 18: Named Entity Recognition and Linking...2018/01/24  · Definitions •Entity: something that exists by itself •Named entity (NE):entity with specific name •Mention: phrase that

LVCoref:RuleBasedSystem

• Exactstringmatch• Precisegrammaticalconstructions• Appositives (AndrisBērziņš,thepresidentofLatvia)• Nominalpredicatives (Jānis Bērziņšisaprofessor)• Acronyms

• Headmatchvariations (SupremeCourtoftheRepublicofLatvia↔SupremeCourtofLatvia)

• Pronounanaphora

18

Page 19: Named Entity Recognition and Linking...2018/01/24  · Definitions •Entity: something that exists by itself •Named entity (NE):entity with specific name •Mention: phrase that

LVCoref:Results

19

F1 P R

Predictedmentions

Exactmatch 40.0 74.2 28.0

+Preciseconstruction 46.2 74.5 34.3

+Headmatch 56.2 69.4 47.3

+Pronouns 58.0 65.1 52.3

Goldmentions

Exactmatch 42.8 86.0 29.2

+Preciseconstruction 45.4 98.4 29.5

+Headmatch 66.8 88.5 54.1

+Pronouns 76.5 87.0 68.9

Page 20: Named Entity Recognition and Linking...2018/01/24  · Definitions •Entity: something that exists by itself •Named entity (NE):entity with specific name •Mention: phrase that

LVCoref: Components

0.3

0.9

0.1

0.7

1.12.3

-1 0 1 2 3 4

Gender

Number

Syntacticalconstraints

Entitylevelconstraints

Namedentitylabels

Syntacticalinformation

pp

ΔR ΔP ΔF120

Page 21: Named Entity Recognition and Linking...2018/01/24  · Definitions •Entity: something that exists by itself •Named entity (NE):entity with specific name •Mention: phrase that

DataannotationforLatvian• SelectedparagraphsfromTheBalancedCorpusofModernLatvian

• NER/NEL• 8categories:person,organization,location,GPE,product,event,time,entity

• DerivedfromMUCandAMRguidelines• LinkstoWikipedia

• Co-references:• Namedentities,nounphrasesandpronounsreferringtospecificentities

• DerivedfromOntoNotes 6.0guidelines• Alsosyntax,frames,AMR

Projects:• Tekstaautomātiskasdatorlingvistikas analīzespētījumsjaunainformācijasarhīvaproduktaizstrādē(2013-2014)

• Daudzslāņu valodas resursu kopa teksta semantiskajai analīzei unsintēzei latviešu valodā (2016-2019) 21

Page 22: Named Entity Recognition and Linking...2018/01/24  · Definitions •Entity: something that exists by itself •Named entity (NE):entity with specific name •Mention: phrase that

FutureWork

• Moreannotateddata• ImprovementiondetectionforLatvian• HighimpactonCR• Jointlearningwithshallowsyntaxparsing

• Namedentityrecognition• Hierarchicalnamedentitymentions• IncorporateneuralLSTMcharacterlevellanguagemodel

• Co-referenceresolutionJointlearningwithNEL

• NamedentityrecognitioninspeechSubword andclassbasedlanguagemodels

• Cross-lingualnamedentitylinking22

Page 23: Named Entity Recognition and Linking...2018/01/24  · Definitions •Entity: something that exists by itself •Named entity (NE):entity with specific name •Mention: phrase that

Publikācijas• Miranda,S.,Znotins,A.,Cohen,S.B.,Barzdins,G.(2017).MultilingualClusteringofStreamingNews.SubmittedtoNAACLHLT.

• Znotiņš,A.(2016).WordEmbeddingsfor Latvian NaturalLanguage ProcessingTools.InHumanLanguageTechnologies:TheBalticPerspective,IOSPress.WebofScience.

• Znotiņš,A.,Polis,K.,andDarģis R.(2015).MediamonitoringsystemforLatvianradioandTVbroadcasts.InProceedingsofthe16thAnnualConferenceoftheInternationalSpeechCommunicationAssociation(INTERSPEECH),732–733.WebofScience,Scopus.

• Darģis,R.andZnotiņš,A.(2014).BaselineforKeywordSpottinginLatvianBroadcastSpeech.InHumanLanguageTechnologies:TheBalticPerspective,IOSPress,75–82.WebofScience.

• Znotiņš,A.andPaikens,P.(2014).Coreference ResolutionforLatvian.InProceedingsoftheNinthInternationalConferenceonLanguageResourcesandEvaluation(LREC),3209–3213.WebofScience.

• Pretkalniņa,L.,Znotiņš,A.,Rituma,L.,Goško,D.(2014).Dependencyparsingrepresentationeffectsontheaccuracyofsemanticapplications— anexampleofaninflectivelanguage.InProceedingsoftheNinthInternationalConferenceonLanguageResourcesandEvaluation(LREC),4074–4081.WebofScience.

23

Page 24: Named Entity Recognition and Linking...2018/01/24  · Definitions •Entity: something that exists by itself •Named entity (NE):entity with specific name •Mention: phrase that

Summary

• NER/NEL:findallentitymentionswithinatextandlinkthemtoauthoritativedatasource(knowledgebase)• NELastextanchoring:• Coarsesummaryofwhattextisabout• OthertaskscanutilizeinformationfromKB• Othertaskscanenrichnamedentitieswithadditionalsemanticannotationlayers

24