nlp for semantic web
TRANSCRIPT
-
8/3/2019 Nlp for Semantic Web
1/78
Copyright 2009 Digital Enterprise Research Institute. All rights reserved
Digital Enterprise Research Institute www.deri.ie
Natural Language Processing
- for the Semantic Web -
Paul Buitelaar
-
8/3/2019 Nlp for Semantic Web
2/78
Digital Enterprise Research Institute www.deri.ie
2
SemanticWebChallenge:LegacyData
LinkedData LegacyDataUnstructured,Un-Linked
-
8/3/2019 Nlp for Semantic Web
3/78
Digital Enterprise Research Institute www.deri.ie
3
SemanticAnalysisofUnstructuredLegacyData Examplesin:SemanticSearch,Ontology-basedInformationExtraction,
OntologyLearning
NLPLayerCakewithPointers PartofSpeechTagging,Morphology,PhraseStructure SemanticTagging
WordNet,FrameNet,WordSenseDisambiguation, NamedEntities,Terms,Thesauri,Ontologies, AdvancedTopic:Ontology-LexiconInterface
GrammaticalFunctions,DependencyStructure,DiscourseAnalysis
FurtherRelevantPointers GeneralTools,Organizations,Conferences,Journals,Sites,Lists,
OverviewoftheTutorial
-
8/3/2019 Nlp for Semantic Web
4/78
Digital Enterprise Research Institute www.deri.ie
4
MachineLearning in/forTextMining:FeatureExtractioninClustering,Classification,
TextMiningin/forInformationRetrievalSeeTutorialonInformationMining byConorHayesDERIStreamonSemanticInformationMining bringstogetherNaturalLanguageProcessingandInformationMining
WhattheTutorialwillnotaddress
-
8/3/2019 Nlp for Semantic Web
5/78
Digital Enterprise Research Institute www.deri.ie
5
SemanticAnalysisofUnstructuredLegacyData Examplesin:SemanticSearch,Ontology-basedInformationExtraction,OntologyLearning
NLPLayerCakewithPointers PartofSpeechTagging,Morphology,PhraseStructure SemanticTagging
WordNet,FrameNet,WordSenseDisambiguation, NamedEntities,Terms,Thesauri,Ontologies, AdvancedTopic:Ontology-LexiconInterface
GrammaticalFunctions,DependencyStructure,DiscourseAnalysis
FurtherRelevantPointers Conferences,Journals,Websites,MailingLists,
OverviewoftheTutorial
-
8/3/2019 Nlp for Semantic Web
6/78
Digital Enterprise Research Institute www.deri.ie
6
SemanticAnnotation&Search
MuchMore(2000-2003) Semanticannotationofasetofmedicalscientificabstracts&patientrecordsasqueriesacross
languages(English,German)
OntologyLearning OntoLT(2004-2005) Extractionofclasses,subclassesandrelations(objectproperties)fromalinguisticallyannotated
documentset
Ontology-basedInformationExtraction SmartWeb(2005-2008) Extractionofentitiesandeventsfromasetoffootballmatchreports
SemanticVideoBrowsing K-Space(2007-2008) Extractionofentitiesandeventsfromasetoffootballmatchreports,alignedwithfootballvideo,
enablingsemantic-levelvideoindexingandbrowsing
OpenCalais,GIST Industrialstrengthopensource/commercialsemanticannotation&retrieval
SomeExampleApplications
-
8/3/2019 Nlp for Semantic Web
7/78
Digital Enterprise Research Institute www.deri.ie
7
SemanticAnnotation&SearchMuchMore
-
8/3/2019 Nlp for Semantic Web
8/78
Digital Enterprise Research Institute www.deri.ie
8
SemanticAnnotation&SearchMuchMore
-
8/3/2019 Nlp for Semantic Web
9/78
Digital Enterprise Research Institute www.deri.ie
9
SemanticAnnotation&SearchMuchMore
-
8/3/2019 Nlp for Semantic Web
10/78
Digital Enterprise Research Institute www.deri.ie
10
SemanticAnnotation&SearchMuchMore
-
8/3/2019 Nlp for Semantic Web
11/78
Digital Enterprise Research Institute www.deri.ie
11
SemanticAnnotation&SearchMuchMore
-
8/3/2019 Nlp for Semantic Web
12/78
Digital Enterprise Research Institute www.deri.ie
12
SemanticAnnotation&SearchMuchMore
-
8/3/2019 Nlp for Semantic Web
13/78
Digital Enterprise Research Institute www.deri.ie
13inguisticStructure2OntologyMappingRules
OntologyLearningOntoLT
-
8/3/2019 Nlp for Semantic Web
14/78
Digital Enterprise Research Institute www.deri.ie
14 Extraction&InspectContexts
OntologyLearningOntoLT
-
8/3/2019 Nlp for Semantic Web
15/78
Digital Enterprise Research Institute www.deri.ie
15 ExtractOntologyFragments
OntologyLearningOntoLT
-
8/3/2019 Nlp for Semantic Web
16/78
Digital Enterprise Research Institute www.deri.ie
16
OntologyLearningOntoLTGerman Clinical Report:
An 40 Kniegelenkprparaten wurden mittlere Patellarsehnendrittel mit einerneuen Knochenverblockungstechnik in einem
zweistufigen Bohrkanal bzw. mitkonventioneller Interferenzschraubentechnikfemoral fixiert.
English Translation:
In 40 human cadaver knees, mid patellarligament thirds were fixedwith a trapezoidbone block on one side on the femoral side
in a two-level drill hole, or with aconventional interference screw.
LinguisticAnnotation(fragments)
-
8/3/2019 Nlp for Semantic Web
17/78
Digital Enterprise Research Institute www.deri.ie
17
Ontology-basedInformationExtractionSmartWeb
semistruct#Deutschland_vs_Brasilien_30_Juni_2002_18:00[
sportevent#matchEvents -> soba#ID11 ].
soba#ID11:sportevent#Parry[
sportevent#committedBy ->
semistruct#Deutschland_vs_Brasilien_30_Juni_2002_18:00_Oliver_Kahn_PFP ].
Information Extraction
Ontology Population
Oliver Kahn konnte den Schuss von Beto halten.
Oliver Kahn could stop the shot by Beto.
-
8/3/2019 Nlp for Semantic Web
18/78
Digital Enterprise Research Institute www.deri.ie
18
Ontology-basedInformationExtractionSmartWeb
-
8/3/2019 Nlp for Semantic Web
19/78
Digital Enterprise Research Institute www.deri.ie
19
A/V Feature
Analysis
ExtractedEntities and Events
Minute-by-Minute Match Reports
Non-Linear Event and Entity Browsing
SemanticVideoBrowsingK-Spacehttp://keg.vse.cz/wf/kspace/smil/
-
8/3/2019 Nlp for Semantic Web
20/78
Digital Enterprise Research Institute www.deri.ie
20
IndustrialApplicationsGIST(CALAIS)
-
8/3/2019 Nlp for Semantic Web
21/78
Digital Enterprise Research Institute www.deri.ie
21
Open Calais Extracts Entities, Facts
-
8/3/2019 Nlp for Semantic Web
22/78
Digital Enterprise Research Institute www.deri.ie
22
With a split decision in thefinal two primaries and a
flurry of superdelegateendorsements, Sen. BarackObama sealed theDemocratic presidentialnomination Tuesday nightafter a grueling, history-making campaign that willmake him the first AfricanAmerican to head a major-
party ticket.
Before a chanting, cheering
audience in St. Paul, Minn.,the first-term Illinois senatorsavored what once seemedan unlikely outcome to theDemocratic race againstSen. Hillary RodhamClinton. He now facesanother hard-fought battle,against Sen. John McCain,the presumptive Republicancandidate.
Open Calais Extracts Entities, Facts
-
8/3/2019 Nlp for Semantic Web
23/78
Digital Enterprise Research Institute www.deri.ie
23
SemanticAnalysisofUnstructuredLegacyData Examplesin:SemanticSearch,Ontology-basedInformationExtraction,
OntologyLearning
NLPLayerCakewithPointers PartofSpeechTagging,Morphology,PhraseStructure
SemanticTagging WordNet,FrameNet,WordSenseDisambiguation, NamedEntities,Terms,Thesauri,Ontologies, AdvancedTopic:Ontology-LexiconInterface
GrammaticalFunctions,DependencyStructure,DiscourseAnalysis
FurtherRelevantPointers Conferences,Journals,Websites,MailingLists,
OverviewoftheTutorial
-
8/3/2019 Nlp for Semantic Web
24/78
Digital Enterprise Research Institute www.deri.ie
24
NLP-ACompleteExampleHe booked the large table in the corner.S
heNPSubject , AgentX
booked the large table in the cornerVP
... It was still available.S
hePronoun3rd PersonAnimate
the large table in the cornerNP
Direct Object, PatientDefinite Y
the large tableNP
in the cornerPP
tableNounSingularHeadfurniture_01
the cornerNPDefinite Z
inPrepositionHeadPredicate
largeAdjectiveModifier
was still availableVP
itNPSubject, PatientY
itPronoun3rd PersonInanimate
bookVerbPast, 3rd PersonHeadPredicate
isVerbPast, 3rd PersonHeadPredicate
still availableAdvP
-
8/3/2019 Nlp for Semantic Web
25/78
Digital Enterprise Research Institute www.deri.ie
25
NLPLayerCake
Tezao
[table]Hebookedthelargetableinthecorner...
PoS
hTn
[table:noun]
MopocaAys
[table~s]work~ing][Sommer~schule]
SmacTn
[table:ARTIFACT,furniture_01]
Pae
[[[the][large][table]NP][[in][the][corner]PP]NP]
D Sruue
[[the:SPEC][large:MOD][table:HEAD]NP][[He:SUBJ][booked:PRED][[this][table:HEAD]NP:DOBJ]S]
Dsose
Ays
[He:SUBJ][booked:PRED]this[[table:HEAD]NP:DOBJ:X1][[It:SUBJ:X1][was:PRED]available]
-
8/3/2019 Nlp for Semantic Web
26/78
Digital Enterprise Research Institute www.deri.ie
26
Tokenization Where are the words?
Part of Speech Tagging Is this word a verb or a noun or something else?
Morphology Can I split this word up?
Phrase Structure Do these words go together?
Semantic Tagging What objects are expressed by the words/phrases in the sentence?
Grammatical Functions & Dependency Structure Which objects do what? And in relation to which others?
Discourse Analysis Which events are expressed throughout a text/discourse? How do they interact? And which objects are
involved?
NLPLayers
-
8/3/2019 Nlp for Semantic Web
27/78
Digital Enterprise Research Institute www.deri.ie
27
Annotate each word in a sentence with a part-of-speech (PoS) tag -useful for subsequent syntactic parsing
Most common PoS tag set for English is Penn Treebank set of 45 tags,e.g.
John saw the saw and decided to take it to the table.NNP VBD DT NN CC VBD TO VB PRP IN DT NN
Other tag sets in use for other languages, e.g. Stuttgart-Tbingen TagSet (STTS) for German Challenge in Part-of-Speech Tagging is ambiguity
like can be Verb: I like/VBP candy. Preposition: Time flies like/IN an arrow.
around can be Preposition: I bought it at the shop around/IN the corner. Particle: I never got around/RP to getting a car. Adverb: A new Prius costs around/RB $25K.
Part-of-SpeechTaggingAdaptedfrom:http://www.cs.utexas.edu/~mooney/cs388/slides/pos-tagging.ppt
-
8/3/2019 Nlp for Semantic Web
28/78
Digital Enterprise Research Institute www.deri.ie
28 28
Noun (person, place or thing) Singular (NN): dog, fork Plural (NNS): dogs, forks Proper (NNP, NNPS): John, Springfields Personal pronoun (PRP): I, you, he, she, it Wh-pronoun (WP): who, what
Verb (actions and processes) Base, infinitive (VB): eat Past tense (VBD): ate Gerund (VBG): eating Past participle (VBN): eaten Non 3rd person singular present tense (VBP): eat 3rd person singular present tense: (VBZ): eats Modal (MD): should, can To (TO): to (to eat)
Adjective (modify nouns) Basic (JJ): red, tall Comparative (JJR): redder, taller Superlative (JJS): reddest, tallest
Adverb (modify verbs) Basic (RB): quickly Comparative (RBR): quicker Superlative (RBS): quickest
Preposition (IN): on, in, by, to, with
Determiner: Basic (DT) a, an, the WH-determiner (WDT): which, that
Coordinating Conjunction (CC) and, but, or
Particle (RP) off (took off), up (put up)
Part-of-SpeechTaggingPoSEnglishAdaptedfrom:http://www.cs.utexas.edu/~mooney/cs388/slides/pos-tagging.ppt
-
8/3/2019 Nlp for Semantic Web
29/78
Digital Enterprise Research Institute www.deri.ie
29
Closed classcategories are composed of a small, fixed set ofgrammatical function words for a given language: Pronouns (it, he, she, ) Prepositions (on, for, from, to, ) Modals (will, can, may, ) Determiners (a, the) Particles (to, up, off) Conjunctions (and, or)
Open classcategories are composed of large sets of content words andare open to new additions:
Nouns (a googler) Verbs (to google) Adjectives (geeky)
29
Part-of-SpeechTaggingClosedvs.OpenAdaptedfrom:http://www.cs.utexas.edu/~mooney/cs388/slides/pos-tagging.ppt
-
8/3/2019 Nlp for Semantic Web
30/78
Digital Enterprise Research Institute www.deri.ie
30 30
Part-of-SpeechTaggingStateOfTheArt Overview of available PoS taggers:
http://www-nlp.stanford.edu/links/statnlp.html#Taggers
Many are widely-used (often retrainable), a very small selection: TreeTagger, decision trees, free research license
http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger
TnT, Thorsten Brants, HMM, free research licensehttp://www.coli.uni-saarland.de/~thorsten/tnt/
ENGCG, lexicon & rules, commercial (LingSoft)http://www2.lingsoft.fi/cgi-bin/engcg
-
8/3/2019 Nlp for Semantic Web
31/78
Digital Enterprise Research Institute www.deri.ie
31
Some definitions Morphological Analysis split up words into component morphemes
and build a (formal) representation of word-internal structure
Morpheme minimal meaning-bearing unit in a language Stem morpheme that forms central meaning unit in a word Affix word element that can only occur attached to a stem
Prefix specificunspecific(English)
Suffix wonderwonderful(English)
Infix hingihumingi(Tagalog)
Circumfix sagen gesagt(German)
Morphological complexity varies between languages
Isolated languages (no morphology): e.g., Chinese Morphologically poor languages: e.g., English Morphologically complex languages: e.g., Turkish
MorphologicalAnalysisAdaptedfrom:http://courses.washington.edu/ling570/fei_fall07/10_15_morph.ppt
-
8/3/2019 Nlp for Semantic Web
32/78
Digital Enterprise Research Institute www.deri.ie
32
Inflection: stem + morpheme (same PoS class) writing write + V + Progressive books book + N + Plural writes write + V + 3rd Person + Singular flies fly + N + Plural
fly + V + 3rd Person + Singular
Derivation: stem + morpheme (different PoS class) civil civilized civilization
Compounding: multiple stems cabdriver cab + driver doghouse dog + house Flachbildschirm (flat screen) flach + Bildschirm (flat screen)
Flachbild + Schirm (flat view screen)
flach + Bild + Schirm (flat picture screen) Cliticization: stem + clitic
Ive I + have
MorphologicalAnalysisOverview,AmbiguityAdaptedfrom:http://courses.washington.edu/ling570/fei_fall07/10_15_morph.ppt
-
8/3/2019 Nlp for Semantic Web
33/78
-
8/3/2019 Nlp for Semantic Web
34/78
Digital Enterprise Research Institute www.deri.ie
34
Phrase Group of words that functions as a single unit in syntax (Wikipedia)
NP : Noun Phrase (the car, a clever student) VP : Verb Phrase (study hard, play the guitar) PP : Prepositional Phrase (in the class, above the earth) AP : Adjective Phrase (very tall, incredibly large)
Phrase Structure Analysis
Breaking up a sentence into recursively defined coherent units (constitutionalparts), e.g., an NP consisting of several NPs
First step in sentence parsing (see also further NLP layers) Chunks
Non-recursive phrases, as introduced by shallow parsing approach
Chunking Also known as shallow parsing (without overall sentence structure &
grammatical functions see also further NLP layers)
PhraseStructureAnalysisDefinitionsWithinputfrom:http://www.kwary.net/linguistics/gl/GLSyntax01.ppt
-
8/3/2019 Nlp for Semantic Web
35/78
Digital Enterprise Research Institute www.deri.ie
35
NP (Det) N
NP
the
NDet
bus
PP
in
NPP
the
Det N
yard
PhraseStructureAnalysisNP,PPExampleAdaptedfrom:http://www.kwary.net/linguistics/gl/GLSyntax01.ppt
PP P NP
NP
the
NDet
bus
PP
in
NPP
the
Det N
yard
NP (Det) N (PP)
-
8/3/2019 Nlp for Semantic Web
36/78
Digital Enterprise Research Institute www.deri.ie
36
PhraseStructureAnalysisVPExampleAdaptedfrom:http://www.kwary.net/linguistics/gl/GLSyntax01.ppt
VP V (NP) (PP)VP
took
NPV PP
from
NPP
the
Det N
bankthe
Det N
money
-
8/3/2019 Nlp for Semantic Web
37/78
Digital Enterprise Research Institute www.deri.ie
37
PhraseStructureAnalysisStateOfTheArt Overview on Parsers (including shallow parsing) for English
http://www.aclweb.org/aclwiki/index.php?title=Parsers_for_English
Overview for other languages http://www.aclweb.org/aclwiki/index.php?title=List_of_resources_by_language
Some shallow parser demos on the web: TreeTagger (PoS, Chunking), decision trees, free research license
http://www.ims.uni-stuttgart.de/projekte/corplex/TreeTagger
CNTS Memory Based Shallow Parser (Univ. of Antwerpen), classifier, license?http://www.cnts.ua.ac.be/cgi-bin/jmeyhi/MBSP-instant-webdemo.cgi
Univ. of Illinois at Urbana-Champaign, classifier, license?http://l2r.cs.uiuc.edu/~cogcomp/shallow_parse_demo.php
-
8/3/2019 Nlp for Semantic Web
38/78
Digital Enterprise Research Institute www.deri.ie
38
Definition and History Classification of words, phrases with a semantically defined
category
Nowadays associated with Semantic Web (semanticannotation, knowledge markup) and Web 2.0 tagging
In NLP refers to assigning a sense to a word or phrase
Sense sets defined by Originally, machine readable dictionaries, e.g., LDOCE Recent years, wordnets (nouns), framenets (verbs) Increasingly, general & domain ontologies
SemanticTagging
-
8/3/2019 Nlp for Semantic Web
39/78
Digital Enterprise Research Institute www.deri.ie
39
WordNet is a Semantic Lexicon & Lexical Database Organized around meaning rather than word forms Maps words to meanings/interpretations or senses Senses are represented by synsets (sets of synonyms), e.g.,
{board, plank} : piece of lumber {board, committee} : group of people
Machine readable (has a formal structure) Freely downloadable: http://wordnet.princeton.edu/
Integrated wordnets for several European languages EuroWordNet: http://www.illc.uva.nl/EuroWordNet/
Wordnets for many languages with interoperable format http://www.globalwordnet.org/gwa/wordnet_table.htm
SemanticTaggingWordNet
-
8/3/2019 Nlp for Semantic Web
40/78
Digital Enterprise Research Institute www.deri.ie
40
In 1985 a group of psychologists and linguists at Princeton University undertookto develop a lexical database
The initial idea was to provide an aid to use in searching dictionariesconceptually, rather than merely alphabetically
WordNet instantiates hypotheses based on results of psycholinguistic research In anomic aphasia, there is a specific inability to name objects.
When confronted with an apple, say, patients may be unable to utter apple, eventhough they will reject such suggestions as shoe or banana, and will recognize thatapple is correct when it is provided.
Caramazza/Berndt 1978
expose such hypotheses to the full range of the common vocabulary
Miller, George A., Richard Beckwith, Christiane Fellbaum, Derek Gross and Katherine J. Miller. Introduction
to WordNet: an on-line lexical database. In: International Journal of Lexicography 3 (4), 1990, pp. 235 - 244.
SemanticTaggingWordNetOrigin
-
8/3/2019 Nlp for Semantic Web
41/78
Digital Enterprise Research Institute www.deri.ie
41
Synsets represent different Senses Words that occur in several synsets have a corresponding
number of senses i.e. are ambiguous:
SemanticTaggingSynsets,Senses
-
8/3/2019 Nlp for Semantic Web
42/78
Digital Enterprise Research Institute www.deri.ie
42
Homonymy Unrelated Senses, e.g.
The ball went over the fence - artifact The ball went on into the late hours - event
Systematic Polysemy Related Senses, e.g.
The Boston office has been newly decorated - building The Boston office was founded in 1985. - organization The Boston office called. - group-of-people
Also referred to in the literature as regular polysemy (Apresjan 1973) or
logical polysemy (Pustejovsky 1991, 1995 ) systematic polysemyintroduced by (Nunberg & Zaenen 1992) - see also Bierwisch 1983 (schoolexample), Hobbs et al 1993 (office example)
SemanticTaggingAmbiguitycont.
-
8/3/2019 Nlp for Semantic Web
43/78
Digital Enterprise Research Institute www.deri.ie
43
Synsets are organized in hierarchies, defining: generalization (hypernymy) specialization (hyponymy)
Example{entity}
{whole, unit}
{building material}
{lumber, timber}{board, plank}
hyponymy
hypernymy
SemanticTaggingSynsetHierarchy
-
8/3/2019 Nlp for Semantic Web
44/78
Digital Enterprise Research Institute www.deri.ie
44
SemanticTaggingHierarchyExample
-
8/3/2019 Nlp for Semantic Web
45/78
-
8/3/2019 Nlp for Semantic Web
46/78
Digital Enterprise Research Institute www.deri.ie
46
frameplacing:AgentplacesaThemeatalocation,theGoalDavidarrangedhisbriefcaseonthefloor.
archive.v,arrange.v,bag.v,bestow.v,billet.v,bin.v,bottle.v,box.v,brush.v,cage.v,cram.v,crate.v,dab.v,
daub.v,deposit.v,drape.v,drizzle.v,dust.v,embed.v,emplace.v,file.v,garage.v,hang.v,heap.v,
immerse.v,implant.v,inject.v,insert.v,insertion.n,jam.v,lay.v,lean.v,load.v,lodge.v,mount.v,pack.v,
package.v,park.v,perch.v,pile.v,place.v,placement.n,plant.v,plunge.v,pocket.v,position.v,pot.v,put.v,rest.v,rub.v,set.v,sheathe.v,shelve.v,shoulder.v,shower.v,sit.v,situate.v,smear.v,sow.v,stable.v,
stand.v,stash.v,station.v,stick.v,stow.v,stuff.v,tuck.v,warehouse.v,wrap.v
framearranging:AgentputsacomplexThemeintoaparticularConfigurationDavidarrangedthestonesinacircle.
arrange.v,arrangement.n,array.v,deploy.v,deployment.n,format.v,setup.v
SemanticTaggingFrameAmbiguity
-
8/3/2019 Nlp for Semantic Web
47/78
Digital Enterprise Research Institute www.deri.ie
47
Word Sense Disambiguation Classification of the correct sense to a word Based on wordnets & similar resources for many languages Sense-annotated corpora enable classifier training No longer very active area of research in NLP community Annotated corpora, tools, evaluation data sets available
from SenseVal (1-4) evaluation campaigns:
http://www.senseval.org/ Recently attention turned to Semantic Role Labelling and
variety of other tasks in Computational Lexical Semantics
see SemEval evaluation campaign: http://semeval2.fbk.eu/
SemanticTaggingWordSense
-
8/3/2019 Nlp for Semantic Web
48/78
Digital Enterprise Research Institute www.deri.ie
48
Semantic Role Labelling Classification of correct frame category (sense) to a verb &
assign semantic roles to its syntactic arguments
Based on FrameNet availability and similar resources, e.g. PropBank http://verbs.colorado.edu/~mpalmer/projects/ace.html NomBank http://nlp.cs.nyu.edu/meyers/NomBank.html VerbNet http://verbs.colorado.edu/~mpalmer/projects/verbnet.html SemLinkhttp://verbs.colorado.edu/semlink/ OntoNoteshttp://www.bbn.com/ontonotes/ German FrameNethttp://www.coli.uni-saarland.de/projects/salsa
Frame-annotated corpora enable classifier training Recently very active area of research in NLP community
SemanticTaggingSemanticRoles
-
8/3/2019 Nlp for Semantic Web
49/78
Digital Enterprise Research Institute www.deri.ie
49
SemanticTaggingStateOfTheArt Word Sense Disambiguation tools, selection
WSD tools by Ted Pedersen (University of Minnesota, Duluth), freehttp://sourceforge.net/projects/wsdgate/ & others
SenseLearner, Rada Mihalcea (Univ. of North Texas), freehttp://www.cse.unt.edu/~rada/downloads.html#senselearner
SuperSenseTagger, SemTechLab Rome ?, license?http://sourceforge.net/projects/supersensetag/
Semantic Role Labelling tools, selection Shalmaneser (Saarland Univ.), pluggable parsing & classifiers, free license
http://www.coli.uni-saarland.de/projects/salsa/shal/
Univ. of Illinois at Urbana-Champaign, parsing & classifiers, license?http://l2r.cs.uiuc.edu/~cogcomp/srl-demo.php
SWIRL (Universitat Politecnica de Catalunya), parsing & classifiers, GPL licensehttp://www.surdeanu.name/mihai/swirl
-
8/3/2019 Nlp for Semantic Web
50/78
Digital Enterprise Research Institute www.deri.ie
50
Metonymy (part stands for whole) The Boston office called. to call expects an object of type Human in Agent position coerce office into an object of type (Group-of) Person > Human lexical semantic inference: Person Work-at Office
SemanticTaggingLexicalInference
office
Office
Organization
Building
Person
Has-addressLocated-at
Representation-of
Work-at
Work-for
-
8/3/2019 Nlp for Semantic Web
51/78
Digital Enterprise Research Institute www.deri.ie
51
Metonymy in Bridging (of discourse referents) Peter bought a car. The engine runs well. the engine refers to already introduced object (discourse referent) lexical semantic inference: Engine Part-of Car
SemanticTaggingLexicalInference
Car
EnginePart-of
Has-partcar
-
8/3/2019 Nlp for Semantic Web
52/78
Digital Enterprise Research Institute www.deri.ie
University
Schoolis_part_of
Campuslocated_at
label label
school staff
Studentstudies_at
Staff
works_at
Semantic Tagging Ontologies
-
8/3/2019 Nlp for Semantic Web
53/78
Digital Enterprise Research Institute www.deri.ie
University
Schoolis_part_of
Campuslocated_at
has_German_termhas_US-English_term has_Dutch_term
FakulttSchool Faculteit
Studentstudies_at
Staff
works_at
Semantic Tagging Classes, Terms
RDF(S) & OWL current status
-
8/3/2019 Nlp for Semantic Web
54/78
Digital Enterprise Research Institute www.deri.ie
LingInfoOralMucosa
hasLingInfo
Term-1
Mundschleimhaut
hasOrthographicForm
DE
hasLang
hasMorphSynInfo
WordForm-1
N
hasPoS
Term-2 Term-3
hasStem
Mund Schleimhaut
hasOrthographicForm
WordForm
instanceOf
hasMorphSynInfo
Mucosa
hasLingInfo
instanceOf
Semantic Tagging Lexicalized Ontologies
http://olp.dfki.de/LingInfo/
http://ontoware.org/projects/lexonto/
-
8/3/2019 Nlp for Semantic Web
55/78
Digital Enterprise Research Institute www.deri.ie
55
Semantic tagging beyond word senses & semantic roles Terms, Classes, Relations, Properties/Attributes Names
Terms, Classes, Relations, Properties/Attributes Semantic annotation on the basis of a thesaurus or ontology Term recognition & extraction
terms are domain-specific phrases Relation extraction
relations are domain-specific semantic roles Ontology-based information extraction
SemanticTaggingTerms,Classes
-
8/3/2019 Nlp for Semantic Web
56/78
Digital Enterprise Research Institute www.deri.ie
56
SemanticTagging Terms,Relations
GENIA Relation GENIA Term(Class)
SemanticRole
GrammaticalFunction
inhibit interleukin 1 beta
IL-1beta
Agent Subject
insulin secretion Target Direct Object
Withinputfrom:http://www.lrec-conf.org/proceedings/lrec2008/slides/496.ppt
Terms/Classes & Relations in genetics domain GENIA corpus
http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/home/wiki.cgi?page=GENIA+corpus
Examples Interleukin 1 beta inhibits insulin secretion IL-1beta is known to inhibit insulin secretion Insulin secretion is inhibited by IL-1 beta
Term recognition & extraction Grammatical function annotation: subject, direct object, etc.
see further NLP layers
-
8/3/2019 Nlp for Semantic Web
57/78
Digital Enterprise Research Institute www.deri.ie
57
Eurovoc ThesaurusTerminology in all EU languages on all
EU areas: politics, trade, law, science,energy, agriculture,
MT 3606 natural and applied sciencesUF gene pool
genetic resourcegenotypeheredity
BT1 biologyBT2 life sciencesNT1 DNART genetic engineering (6411)
SemanticTaggingTerms,ClassesMedical Subject Headings (MeSH)Thesaurus with taxonomy of ~ 250,000 terms,
representing medical subjects for retrievalpurposes
MeSH Heading Databases, GeneticEntry Term Genetic DatabasesEntry Term Genetic Sequence DatabasesEntry Term OMIMEntry Term Mendelian Inheritance in Man
Entry Term Genetic Data BanksEntry Term Genetic Data BasesEntry Term Genetic Information DatabasesSee Also Genetic Screening
Gene Ontology
Accession GO:0009292Synonyms broad : genetic exchange
Term Lineage all : all (164142)GO:0008150 : biological process (115947)
GO:0007275 : development (11892)GO:0009292 : genetic transfer (69)
-
8/3/2019 Nlp for Semantic Web
58/78
Digital Enterprise Research Institute www.deri.ie
58
Semantic tagging beyond word senses & semantic roles Terms, Classes, Relations, Properties/Attributes Names
Semantic annotation of names Named Entity Recognition Originally intended as extension of Tokenization, e.g. in
recognizing Names and other specific tokens such as Dates, Times
Evolved into a more general identification and classification ofnames of People, Organisations, Companies, Countries, Cities, etc.
Currently merging with ontology-based information extraction
SemanticTaggingNames
-
8/3/2019 Nlp for Semantic Web
59/78
Digital Enterprise Research Institute www.deri.ie
59
SemanticTaggingStateOfTheArtcont. Named Entity Recognition
Good overview of many available toolshttp://en.wikipedia.org/wiki/Named_entity_recognition
Semantic annotation with thesauri, ontologies in various domains, e.g., Annotate biomedical text with UMLS Metathesaurus
MetaMap (US National Library of Medicine), free license
http://mmtx.nlm.nih.gov/
Annotate business text with KIM ontologyKIM (Ontotext), free research license
http://www.ontotext.com/kim/
Annotate football (soccer) text with SWIntO ontologySProUT (DFKI), free research license
http://www.dfki.de/sw-lt/heartofgold/ (web demo)
-
8/3/2019 Nlp for Semantic Web
60/78
Digital Enterprise Research Institute www.deri.ie
60
Parsing parsing, or syntactic analysis, is the process of analyzing a
sequence of tokens to determine their grammatical structure withrespect to a given grammar (Wikipedia)
Shallow parsing (discussed above) provides Part of Speech tags Non-recursive phrases (chunks)
Full (or deep) parsing provides on top of this Constituent structure
complete syntactic structure in terms of interconnected recursive phrases and/or Clause structure
predicate (mostly a verb) and one or more syntactic arguments (phrases) grammatical functions for predicate arguments: subject, direct object,
and/or Dependency structure head-modifier analysis, semantic roles
ParsingOverview
-
8/3/2019 Nlp for Semantic Web
61/78
Digital Enterprise Research Institute www.deri.ie
61
ParsingFullParseExampleHe booked the large table in the corner.S
heNPSubject, Agent
booked the large table in the cornerVP
hePronoun
3rd
personAnimate
the large table in the cornerNP
Direct Object, Patient
the large tableNP
in the cornerPP
tableNounSingularHeadfurniture_01
the cornerNPinPrepositionHeadPredicate
largeAdjectiveModifier
bookVerb
Past, 3rd
personHeadPredicate
-
8/3/2019 Nlp for Semantic Web
62/78
Digital Enterprise Research Institute www.deri.ie
62
ParsingFullParseExampleHe booked the large table in the corner.S
heNPSubject, Agent
booked the large table in the cornerVP
hePronoun
3rd
personAnimate
the large table in the cornerNP
Direct Object, Patient
the large tableNP
in the cornerPP
tableNounSingularHeadfurniture_01
the cornerNPinPrepositionHeadPredicate
largeAdjectiveModifier
bookVerb
Past, 3rd
personHeadPredicate
Part of Speech
Morphology
-
8/3/2019 Nlp for Semantic Web
63/78
Digital Enterprise Research Institute www.deri.ie
63
ParsingFullParseExampleHe booked the large table in the corner.S
heNPSubject, Agent
booked the large table in the cornerVP
hePronoun
3rd
personAnimate
the large table in the cornerNP
Direct Object, Patient
the large tableNP
in the cornerPP
tableNounSingularHeadfurniture_01
the cornerNPinPrepositionHeadPredicate
largeAdjectiveModifier
bookVerb
Past, 3
rd
personHeadPredicate
Phrases
-
8/3/2019 Nlp for Semantic Web
64/78
Digital Enterprise Research Institute www.deri.ie
64
ParsingFullParseExampleHe booked the large table in the corner.S
heNPSubject, Agent
booked the large table in the cornerVP
hePronoun
3
rd
personAnimate
the large table in the cornerNP
Direct Object, Patient
the large tableNP
in the cornerPP
tableNounSingularHeadfurniture_01
the cornerNPinPrepositionHeadPredicate
largeAdjectiveModifier
bookVerb
Past, 3
rd
personHeadPredicate
Predicates
GrammaticalFunctions
-
8/3/2019 Nlp for Semantic Web
65/78
Digital Enterprise Research Institute www.deri.ie
65
ParsingFullParseExampleHe booked the large table in the corner.S
heNPSubject, Agent
booked the large table in the cornerVP
hePronoun
3
rd
personAnimate
the large table in the cornerNP
Direct Object, Patient
the large tableNP
in the cornerPPModifier
tableNounSingularHeadfurniture_01
the cornerNPinPrepositionHeadPredicate
largeAdjectiveModifier
bookVerb
Past, 3
rd
personHeadPredicate
Semantic Tags
Semantic Roles
-
8/3/2019 Nlp for Semantic Web
66/78
Digital Enterprise Research Institute www.deri.ie
66
ParsingFullParseExampleHe booked the large table in the corner.S
heNPSubject, Agent
booked the large table in the cornerVP
hePronoun
3
rd
personAnimate
the large table in the cornerNP
Direct Object, Patient
the large tableNP
in the cornerPPModifier
tableNounSingularHeadfurniture_01
the cornerNPinPrepositionHeadPredicate
largeAdjectiveModifier
bookVerbPast, 3rd personHeadPredicate
Head-ModifierAnalysis
-
8/3/2019 Nlp for Semantic Web
67/78
Digital Enterprise Research Institute www.deri.ie
67
ParsingDependencyStructure
hePronoun3rd personAnimate
tableNounSingularHeadfurniture_01
largeAdjectiveModifierSize
bookVerbPast, 3rd personHeadPredicate
cornerNounSingularModifierLocation
Agent Patient
Size Location
He booked the large table in the corner.
-
8/3/2019 Nlp for Semantic Web
68/78
Digital Enterprise Research Institute www.deri.ie
68
ParsingDependencyStructureforIE
hePronoun3rd person
Animate
tableNounSingular
Head
furniture_01
largeAdjectiveModifier
Size
bookVerbPast, 3rd person
Head
Predicate
cornerNounSingular
Modifier
Location
Agent Patient
Size Location
Class + Properties Extracted Objects & Values Source
Booking x, y PredicateBooking-sponsor Male(x) AgentBooking-order Table(y), Size(large), Location(y,z), Corner(z) Patient
He booked the large table in the corner.
-
8/3/2019 Nlp for Semantic Web
69/78
Digital Enterprise Research Institute www.deri.ie
69
ParsingStateOfTheArt Widely-used parsers
MINIPAR, Dekang Lin, free research license Download: http://www.cs.ualberta.ca/~lindek/minipar.htm Web demo: http://dbis.nankai.edu.cn/miniparweb/
Stanford Parser, Klein/Manning, free research license http://nlp.stanford.edu/software/lex-parser.shtml Web demo: http://nlp.stanford.edu:8080/parser/
Rasp Parser (Sussex Univ.), Briscoe/Carroll, free research license http://www.informatics.susx.ac.uk/research/groups/nlp/rasp/
Link Grammar Parser (CMU), Temperley et al., free license http://www.link.cs.cmu.edu/link/ Web demo: http://nlp.stanford.edu:8080/parser/
-
8/3/2019 Nlp for Semantic Web
70/78
Digital Enterprise Research Institute www.deri.ie
70
Linking event participants (Semantic Role fillers) within andacross sentences, i.e.,
an anaphor can be linked back to a discourse referent thatserves as its antecedent, e.g.,
He bought a bottle of wine, sat down on a stone, and drank it.
he AND it are anaphora
a bottle of wine AND a stone introduce discourse referents
it can be linked back to antecedent a bottle of wine OR a stone
DiscourseAnalysisAnaphoraResolutionWithinputfrom:http://www.ling.su.se/DaLi/education/courses/ngslt_nlp06/PragmaticsGSLTLecture06.ppt
-
8/3/2019 Nlp for Semantic Web
71/78
Digital Enterprise Research Institute www.deri.ie
71
He booked the large table in the corner.S
heNPSubject , AgentX
booked the large table in the cornerVP
... It was still available.S
hePronoun3rd PersonAnimate
the large table in the cornerNPDirect Object, PatientDefinite Y
the large tableNP
in the cornerPP
tableNounSingularHeadfurniture_01
the cornerNPDefinite Z
inPrepositionHeadPredicate
largeAdjectiveModifier
was still availableVP
itNPSubject, PatientY
it
Pronoun3rd PersonInanimate
bookVerbPast, 3rd PersonHeadPredicate
is
VPast, 3rd PersonHeadPredicate
still availableAdvP
DiscourseAnalysisAnaphoraResolution
-
8/3/2019 Nlp for Semantic Web
72/78
Digital Enterprise Research Institute www.deri.ie
72
Linking events in terms of temporal sequence, causality etc., e.g.,
John bought a Mercedes, so Bill leased a BMW. (temporal sequence)
John hid Bills car keys as he had drunk too much. (causality)
DiscourseAnalysisDiscourseStructureWithinputfrom:http://www.ling.su.se/DaLi/education/courses/ngslt_nlp06/PragmaticsGSLTLecture06.ppt
-
8/3/2019 Nlp for Semantic Web
73/78
Digital Enterprise Research Institute www.deri.ie
73
DiscourseAnalysisStateOfTheArt No readily available black-box tools Anaphora resolution often built-in functionality in NER,
parsing, etc.
To experiment with discourse referents, anaphoraresolution etc., try out e.g. Boxer
Johan Bos, Univ. of Rome http://svn.ask.it.usyd.edu.au/trac/candc/wiki/boxer
-
8/3/2019 Nlp for Semantic Web
74/78
Digital Enterprise Research Institute www.deri.ie
74
SemanticAnalysisofUnstructuredLegacyData Examplesin:SemanticSearch,Ontology-basedInformationExtraction,
OntologyLearning
NLPLayerCakewithPointers PartofSpeechTagging,Morphology,PhraseStructure SemanticTagging
WordNet,FrameNet,WordSenseDisambiguation, NamedEntities,Terms,Thesauri,Ontologies, AdvancedTopic:Ontology-LexiconInterface
GrammaticalFunctions,DependencyStructure,DiscourseAnalysis
FurtherRelevantPointers GeneralTools,Organizations,Conferences,Journals,Sites,Lists,
OverviewoftheTutorial
-
8/3/2019 Nlp for Semantic Web
75/78
Digital Enterprise Research Institute www.deri.ie
75
GATE, Univ. of Sheffield Eclipse of Natural Language Engineering http://gate.ac.uk/
UIMA, IBM / OpenSource 'Open, Industrial-Strength Platform for Unstructured Information Analysis and Search http://incubator.apache.org/uima/
NLTK (Natural Language Toolkit), Melbourne Univ. ? Open source Python modules for research and development in natural language
processing - book (June 2009): Natural Language Processing with Python http://www.nltk.org/
MBT: Memory-based tagger-generator and tagger, Univ. of Tilburg/Antwerpen can generate a sequence tagger on the basis of a training set of tagged sequences http://ilk.uvt.nl/mbt/
SProUT, DFKI platform for development of multilingual shallow text processing and information
extraction systems
http://sprout.dfki.de/
FurtherRelevantPointersGeneralTools
-
8/3/2019 Nlp for Semantic Web
76/78
Digital Enterprise Research Institute www.deri.ie
76
Conferences Association for Computational Linguistics
ACL (Int.), EACL (Europe), NAACL (North-America), IJCNLP (AFNLP - Asia) http://www.aclweb.org/ ACL SIGS: http://aclweb.org/aclwiki/index.php?title=Special_interest_groups
International Conference on Computational Linguistics COLING: http://nlp.shef.ac.uk/iccl/
International Conference on Language Resources and Evaluation
LREC: http://www.lrec-conf.org/
Other NLP conferences: EMNLP, CONLL, RANLP, CICLing,
Journals Computational Linguistics, MIT Press Natural Language Engineering, Cambridge University Press
Journal of Logic, Language and Information, Springer Language Resources and Evaluation, Springer
FurtherRelevantPointersPublications
-
8/3/2019 Nlp for Semantic Web
77/78
Digital Enterprise Research Institute www.deri.ie
77
Handbooks Handbook of natural language processing, CRC Press, 2000 new editionin progress (2009) Speech and Language Processing: An Introduction to Natural Language
Processing, Computational Linguistics, and Speech Recognition, PrenticeHall, 2008
The Oxford handbook of computational linguistics, Oxford University Press,2005
Foundations of statistical natural language processing, MIT Press, 2003 Relevant Mailing Lists
Corpora list: http://gandalf.aksis.uib.no/corpora/ Linguist list: http://linguistlist.org/
Other NLP sites - broad overviews of tools, resources, people ACL Wiki: http://aclweb.org/aclwiki LT World: http://www.lt-world.org/
FurtherRelevantPointersMoreReading
-
8/3/2019 Nlp for Semantic Web
78/78
Digital Enterprise Research Institute www.deri.ie
Thanks!
Further Questions: