customizing kim3
Post on 05-Apr-2018
260 Views
Preview:
TRANSCRIPT
-
7/31/2019 Customizing KIM3
1/15
http://ontotext.com/kim 1
CustomizingKIMAstepbystepguidefor
integratinganewontologyinKIM,
incorporatingitinthedefaulttext
analysispipeline,andextractingnewtypesofentitiesandfacts.
-
7/31/2019 Customizing KIM3
2/15
TableofContents
1. Somebackgroundanddescriptionofthetask...............................................................................3
2. Prerequisites.............................................................................................................................................4
3. Procedure...................................................................................................................................................5
3.1 ImportingDBpediaintheKIMsemanticrepository(OWLIM)andmakingitsotherdata
sourcesawareofthisnewontology......................................................................................................................5
3.1.1 ImportingDBpediainOWLIM.......................................................... ............................................................. ............5
3.1.2 MappingDBpediatoPROTON.......................................................... ............................................................. ............5
3.1.3 MarkingtheoriginoftheDBpediainstances........................................................... ...........................................6
3.1.4 ManagingLabels....................................................... ............................................................ ...........................................7
3.2 IncorporatingDBpediainthedefaultKIMIEpipeline.................. ................. .................. .................. ..8
3.2.1 Loadingthegazetteerlists............................................................................................... ...........................................8
3.2.2 Settingupthegazetteerprocessingresource........................................................................................ ............9
3.2.3 AddingthegazetteertotheIEpipelineandseeingitinaction...............................................................10 3.3 Creatingannotations(GrammarRules).............. .................. ................. .................. .................. .........11
3.4 Thewebinterface.................. ................. .................. .................. .................. ................. .................. ...............13
3.5 Changingthevisibilityofresources......................... .................. ................. .................. .................. .........14
-
7/31/2019 Customizing KIM3
3/15
3
KIMcanbecustomizedinmultipleways tosuitdifferent semanticannotationandsearchneeds.Onewaytodo
this isto change the text analysis pipeline to findnew typesof entities and facts, and use the conceptual
modelsandinstancebasesrelevanttoacertaindomain.
Thisguidedescribesthemethodsofadoptingathird-partyontology(DBpedia)inKIM,incorporatingitinthe
defaultIEpipeline,andmakingthepipelineawareoftheknowledgebaseforthisnewmappedontology.
1. SomebackgroundanddescriptionofthetaskTheresourcesintheKIMdefaultIEpipelinedependontheKIMdefaultontology-the PROTONontology.Itis
theformalstructureoftheKIMknowledgebase.PROTONisagenericupper-levelontology,whichconsistsof
about100classesand300propertiesof general worldly notions.If youwanttousethe full functionalityof
KIM,whenusinganewontology,thebestistomapittoPROTON.
DuetothecomplexityoftheIEprocess,addinganewontologytoKIMandmakingKIMawareofit,isnotaone
stepprocess.IfthePROTONontologyandthenewoneareverysimilarasadomain,youonlyhavetoalign
them,whichmeansmappingthetermsthatareusedforthesamenotions.Butiftheyarecompletelydifferent,
thenyouhavetogothroughallthestagesofintegratinganontologyinKIM.Inmostcases,thetaskisamixture
ofboth.PartofthenewontologymaybeusablebyjustaligningittoPROTON,andtheotherpart-bymaking
theprocessingresourcesawareofthisnewpart(addingittolists,creatingnewgrammarrulesetc.).
ForadditionalinformationabouthowtoextendtheKIMInformationExtractioncapabilities,please
seetherespectivesectionintheKIMadministratorsguide:
http://www.ontotext.com/kim/getting-started/documentation
-
7/31/2019 Customizing KIM3
4/15
4
2. PrerequisitesInordertointegratetheDBpediaontologyinKIMbyincorporatingitintheKIMdefaultIEpipeline,youwill
need:
KIMinstallation-canbedownloadedfrom http://ontotext.com/kim/KIM-download.html . Forguidesimplicitypurposes,wewillnametheKIMinstallationfolderKIM
a DBpedia extract - a smallsubset of the original DBpedia (http://dbpedia.org) ontology - dbpedia-ontology.zip
Figure1.AnextractofDbpediaontologyclasses
-
7/31/2019 Customizing KIM3
5/15
5
3. Procedure.3.1 ImportingDBpediaintheKIMsemanticrepository(OWLIM)andmakingitsotherdatasourcesawareofthisnewontology
3.1.1 ImportingDBpediainOWLIM Createasub-folderdbpediaintheKIMcontextfolder.ItwillbeusedasstorageforalltheRDFdatainthis
task.(KIM/context/default/kb/dbpedia/)
Putthedbpedia_3.5.1.owlfile,containingtheDBpediaontology,inthedbpedia folder. Putdbpedia_instances.nt,containingtheactualobjectsdescription,in
KIM/context/default/kb/dbpedia
ellOWLIMtoloadtheseadditionalRDFdataatstart-up.Addthetwofilesdbpedia_3.5.1.owlanddbpedia_instances.nttothelistof importsand
defaultNSdefinitionsinKIM/config/owlim.ttl.:
owlim:imports
.....
kb/dbpedia/dbpedia_3.5.1.owl;
kb/dbpedia/dbpedia_instances.nt;";
owlim:defaultNS
.....
http://dbpedia.org/resource/;
http://dbpedia.org/resource/;";
NowyouhavearunningKIMwithDBpediaontologyloaded,butitisprettyautonomousandcannotchangethe
IEprocessalot.
3.1.2 MappingDBpediatoPROTONWhen mapping ontologies you look for classes that are identical and can be directly mapped. Here with
DBpedia, the classes Place, Organization and Person can be directly mapped to Location,
OrganizationandPersoninPROTON.
@prefixrdfs:.
@prefixprotons:.
@prefixprotont:.
-
7/31/2019 Customizing KIM3
6/15
CustomizingKIMAstepbystepguide
Copyright2011OntotextAD Page:#of15 6
@prefixdbpedia:.
dbpedia:Placerdfs:subClassOfprotont:Location.
dbpedia:Organisationrdfs:subClassOfprotont:Organization.
dbpedia:Personrdfs:subClassOfprotont:Person.
TherestoftheclassesdonothaveadirectequivalentinPROTON.Thereforeyoucansub-class themtothe
classofentitiesthatismoregeneral,inthiscasetheprotons:Entity:
dbpedia:Activityrdfs:subClassOfprotons:Entity.
dbpedia:AnatomicalStructurerdfs:subClassOfprotons:Entity.
........
Create a new file (e.g. dbpedia_proton.nt) into the dbpedia folder. There you put the mappingstatementstosub-classtheclassesofthenewontologytotheirPROTONequivalents:
Finally add a record of this mapping file in the imports and defaultNS sections in
KIM/config/owlim.ttl.
Nowyouhaveentitiesfrombothontolgies(PROTONandDBpedia)inthesemanticrepository(OWLIM).Butin
ordertoproperlyusethemforrecognizingtheirmentionsintexts,youneedsomehowtodifferentiatethem.
3.1.3 MarkingtheoriginoftheDBpediainstancesYoucanspecifytheiroriginwiththeproperty protons:generatedBy.ThiswayyoumarkeveryinstancefromDBpediaasbeinggeneratedby http://dbpedia.org/page/DBpedia .
@prefixprotons:.
@prefixdbpage:.
protons:generatedBydbpage:DBpedia.
protons:generatedBydbpage:DBpedia.
........
There are various approaches to retrieve the complete list of entities. In this case you can use regular
expressionsandbashtoextractthemfromdbpedia_instances.ntfile.Create the RDF and store it ina file calleddbpedia_generated_by.n3.Putthefileinthe dbpedia
folder, and then add (kb/dbpedia/dbpedia_generated_by.n3) to the list of imports and
defaultNSdefinitionsinKIM/config/owlim.ttl.
-
7/31/2019 Customizing KIM3
7/15
CustomizingKIMAstepbystepguide
Copyright2011OntotextAD Page:#of15 7
3.1.4 ManagingLabelsLabelsareveryimportantforentityrecognitionintext.TheKIMdatamodelreliesontheuseoftheproperties
rdfs:labelandprotons:mainLabel.Therefore,weadviseyoutosetmeaningfullvaluesforrdfs:label
andprotons:mainLabelforeachinstance.
Butfirst,configurethepropertyENTITY_DESCRin KIM/config/install.properties.
TheENTITY_DESCRpropertydeterminesthewaythesemanticrepositorystorestheentitylabels.Itcanbe
set to -aliases or labels. Usinglabels is generally the preferred approach. It is simpler and more
efficient. Aliases isusedin morecomplex cases,whereyou need to alsokeep metadata for the specific
labels.Therefore,makesurethispropertyissettoLabels.
After that, look at dbpedia_instances.nt. You can see that all instance labels are defined with the
predicatefoaf:name.Createanrdfs:labelstatementforeachfoaf:namestatement.Youcandothisin
severalways.
Useinferencerulestocreatethenewstatements,ifyouseeapattern.TherulewillexistintheOWLIMinferencerulesdefinitionsinKIM/context/default/kb/KIMRules.pieandthisishowitwill
looklike:
e
ename
----------------------
ename
tellOWLIMthatfoaf:nameandrdfs:labelarethesame:
foaf:nameowl:sameAsrdfs:label.
ThiswillcauseOWLIMtocreaterdfs:labelstatementforeveryfoaf:namestatementandviceversa.
Ingeneralitisrecommendedtouseexplicitstatements.Socreateafilewithexplicitlabelsdefinitionscalled dbpedia_labels.n3 and put it in the dbpedia folder. Update the definitions in
KIM/config/owlim.ttl.
rdfs:label"Arsenal".
rdfs:label"ArsenalFootballClub".
rdfs:label"TheGunners".
.......
Finallywhatyouhavetoconsideristheprotons:mainLabel.Itservesasaprimaryrepresentationaspectof
an entity in the graphical interface. The protons:mainLabel is actually a subproperty of rdfs:label.
Therefore,apropertyrdfs:labelwillbeaddedtoeveryprotons:mainLabel.
-
7/31/2019 Customizing KIM3
8/15
CustomizingKIMAstepbystepguide
Copyright2011OntotextAD Page:#of15 8
Definetheprotons:mainLabelforeachinstanceintheontology:
@prefixprotons:.
protons:mainLabel"ABBA".
protons:mainLabel"Asia".
........
Putthedefinitionsinthe dbpediafolderinafilecalleddbpedia_main_labels.n3andaddthefileto
thelistofdefinitionsin KIM/config/owlim.ttl.
Now you come to the stage whereyou incorporate DBpedia in the defaultKIM IEpipeline, and make the
pipelineawareoftheknowledgebaseforthisnewmappedontology.
3.2 IncorporatingDBpediainthedefaultKIMIEpipelineThedefaultKIMIEpipelineisacustomizedGATEpipeline.Theprocessingresourcethatactuallyfindstheentity
mentionsintextsisthesocalledLargeKnowledgeBaseGazetteer(LKBGazetteer).
3.2.1 LoadingthegazetteerlistsInthedefaultconfiguration,KIMcomeswithaworkinggazetteer,whichloadsitsdictionariesusingaSPARQL
queryoverourRDFdata.Inthegeneralcase,makingthegazetteerusethenewentitiesisjustamatterof
changingthequery,containedinKIM/config/query.txt.TheDBpediacase,however,isalittlebitmore
complex,duetothecomplexityoftheontology(richverticalstructure).Therefore,wesuggestthatyousetupa
newgazetteerforeachmajorclassofobjectsyouwanttorecognize.
First,constructthequerytoloadthegazetteerlists:
prefixrdfs:
prefixprotont:
PREFIXprotons:
SELECT?la?entity?cl
WHERE{
?entitya?cl;rdfs:label?la;
protons:generatedBy.
?clrdfs:subClassOfprotont:Person.
OPTIONAL
{
?scrdfs:subClassOf?cl.
?entitya?sc.
filter(?cl!=?sc)
-
7/31/2019 Customizing KIM3
9/15
CustomizingKIMAstepbystepguide
Copyright2011OntotextAD Page:#of15 9
}
filter(!bound(?sc)&&isURI(?cl))
}
Itwillreturnalistofallinstancesofclass protont:Personanditssubclasses.Thedbpedia:Personisits
subclass,soitwillbeincludedintheresults.Theonlyrequirementforthequeryistoreturnthelabel,the
instanceURIandtheclassURI,inthisorder.
When you create the query, you can use some tools to see the actual results that will be loaded in the
gazetteerdictionary.SuchtoolsareJVisualVMwiththeJConsoleextension.Orasimplewebservicecalllikethe
onewedescribeinthedocumentation:
http://www.ontotext.com/kim/getting-started/documentation (WSAPIsection)
Create the folder KIM/contex/default/resources/gazetteer/dbpedia-person and put the
queryinafilenamedquery.txtthere.
Clearthecaches.
Wheneveryoumakesomechangesthatconcerntheongology,youshouldremovetheOWLIMimage
bydeletingKIM/context/default/populated.WhenyoustartKIMagain,itwillgenerateafresh
image.
3.2.2 SettingupthegazetteerprocessingresourceOpentheKIMGATEinterfacebyrunningKIM/bin/kim gate
CreateaLargeKBGazetteerresourcewiththefollowingsetup:
Figure2.LKBGazetteersetup
Abriefdescriptionoftheproperties:
annotationLimit - when thegazetteercreates theamountof Lookups indicated in this propertyvalue,itstops
caseSensitive-whetherthematchingiscasesensitiveornot
-
7/31/2019 Customizing KIM3
10/15
CustomizingKIMAstepbystepguide
Copyright2011OntotextAD Page:#of15 10
dictFeederClass-settocom.ontotext.kim.model.KimDictionaryFeederImpl dictFeederParamssetthedirectoryyoucreatedforPerson FeedSetupPath=$relpath$resources/gazetteer/dbpedia-person dynamicDictEnabled-settofalse feedTransformerStages-additionaltransformationsovertheterms outputASName-theLookupannotationsarecreatedinthissetPerson relpath-setthistoKIM/context/default/resources staticDictEnabled-setthistotrue staticDictSerializationPath-thecacheisstoredhere
Whenthegazetteerinitializesforthefirsttime,itwilllookforafilenamed query.txtinthefoldersetinthe
dictFeederParams.Thegazetteerwillreadthequeryfromthereandinitializeitsdictionary.BothSPARQL
andSeRQLcanbeused.Whenyoudesignyourquery,itisimportanttousetheexactorderandmeaningofthe
queryparameters-label,instanceURI,directclass.Thenamesarenotimportant.
TheoutputASNameissettoPerson.ThismeansthatthegazetteerwillcreateitsLookupannotationsinthis
annotationset.Thisishowyouwilldifferentiatebetweentherecognizedconceptsbythisgazetteerandothergazetteers.
Afterwards,savetheapplicationstatetoKIM/context/default/resources/IE.gapp.
Ifyouwantthegazetteertocreateitsdictionariesanew,youmustremovethecachefromthefolder
youhavesetinstaticDictSerializationPath.Inthiscase
KIM/context/default/populated/gazetteer-person.
3.2.3 AddingthegazetteertotheIEpipelineandseeingitinactionNowtheresourceisloadedintomemory,andyouhavetoaddittothepipeline.
Figure3.AddingtheprocessingresourcetotheIEpipeline
-
7/31/2019 Customizing KIM3
11/15
CustomizingKIMAstepbystepguide
Copyright2011OntotextAD Page:#of15 11
CreateadocumentfromthewikipediaarticleaboutAristotle(http://en.wikipedia.org/wiki/Aristotle ).Additto
thecorpusandrunthepipelineoverit.Youwillbeabletoseeyournewgazetteerinaction:
Figure4.LookupannotationsintheGATEGUI
3.3 Creatingannotations(GrammarRules)The IE pipeline creates annotations. The gazetteer creates Lookup annotations, but there are also other
temporaryannotationswhichonlyroleis tohelpotherprocessing resourcesandrulestocreatemeaningful
annotations at the end. KIM has a whitelist with all these meaningful annotation types in the property
IE_ANN_TYPES inKIM/config/nerc.properties. At the end of the processing of a document, all
annotationsnotinthislistareremoved.
NowthetaskistotransformallLookupsfromthePersonannotationsetintoPersonannotationsinthe
defaultannotationset.Theeasiestwayiswithajaperule.TheJaperulelookslikethis:
Phase:dbpedia_person
Input:Person
Options:control=all
Rule:dbpedia_person
({Lookup}):match
-->
:match.Person = { class = :match.Lookup.class , inst = :match.Lookup.inst,
rule=dbpedia_person}
-
7/31/2019 Customizing KIM3
12/15
CustomizingKIMAstepbystepguide
Copyright2011OntotextAD Page:#of15 12
Therule does nottransform theexisting Lookupannotations. ItcreatesnewPerson annotations
overthesamephrasesasthelookups.
Putthisgrammarinafilecalleddbpedia_person.japeandstoreitin
KIM/context/default/resources/grammar/dbpedia. Then create a Jape transducer processing
resourceforthisgrammarintheGATEUIandplaceitafterthenewgazetteerinthepipeline.Theimportant
thing here is to set the inputASName parameter to the new annotation set (Person) and leave the
outputASNameparameterempty.ThiswillmaketherulematchannotationsfromthePersonannotationset
andcreateannotationsinthedefaultannotationset.
Figure5TheJapetransducerprocessingresourceintheGATEUI
Runthepipelineoveryourtestdocumentandseehowitwillcreatenewannotationsoftype Personinthe
defaultannotationset.
StorethechangesyoumadeinGATE,sothatKIMwillbeabletousethemlater.Youcandothisbyrightclicking
onthepipelineandchosingSave Application State.Thensave itovertheapplicationdescriptor
fileKIMuses,whichinthiscaseisKIM/context/default/resources/IE.gapp.Theexactpathisset
intheIE_APP propertyin KIM/config/nerc.properties.
-
7/31/2019 Customizing KIM3
13/15
CustomizingKIMAstepbystepguide
Copyright2011OntotextAD Page:#of15 13
3.4 ThewebinterfaceWhenyourunKIMwiththechangedpipelineandannotatesomedocuments,thenewentitieswillappearin
thewebinterface:
Figure6.ThenewentitiesintheKIMwebUI
-
7/31/2019 Customizing KIM3
14/15
CustomizingKIMAstepbystepguide
Copyright2011OntotextAD Page:#of15 14
3.5 ChangingthevisibilityofresourcesInordertobeabletoseeandusethenewclassesandpropertiesinthewebinterface,KIMusesavisibility
mechanism.Thenewresourceshavetobedefinedasvisiblein
KIM/context/default/kb/visibility.nt.Hereisanexcerpt:
"".
"".
........
ClearthecacheofKIM(removeKIM/context/default/populated)andstartitagain,youshouldbe
abletoseetheadditionalclassesintheontologyview:
-
7/31/2019 Customizing KIM3
15/15
CustomizingKIMAstepbystepguide
15
Figure7.ThenewontologyintheKIMWebUI
WehaveshownhowtoutilizeonlythePersonclassfromthenewontology.Theprocessisidenticalfortheotherclasses.
DescribedhereisonlyasingleapproachforadoptinganewontologyintoKIM.Everyontologyisuniqueand
mayrequireadifferentsetup.Forexample,youcanuseonlyasinglegazetteertocreateLookupannotations
foralltheinstances,andthenjaperulestodeterminetheirrealannotationtype.Thiswillrequireonlychanging
thedefaultgazetteerqueryinKIM/config/query.txt andwritingsomejaperules.
top related