how well does your instance matching system perform? experimental evaluation with lance

HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE

TzaninaSaveta,EvangeliaDaskalaki,GiorgosFlouris,

IriniFundulakiInstituteofComputerScience–FORTH,Greece

Axel-CyrilleNgongaNgomoIFI/AKSW,UniversityofLeipzig,Germany

10/31/16 ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 1

WhyInstanceMatching?

ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 2*AdaptedfromSuchanek&Weikumtutorial@SIGMOD2013

Differentsourcescontaindifferentdescriptionsofthesamerealworld

entity

InstanceMatchingforLinkedData

SetofRDFtriplesconstituteanRDF

SparseData

Richsemanticsexpressedinterms

ofontologies

LargenumberofsourcestointegrateValue,Structure

andSemanticsHeterogeneities

*AdaptedfromSuchanek&Weikumtutorial@SIGMOD2013

Benchmarking

Instancematchinghasledtothedevelopmentofanumberofmatchingtechniquesandtools

•  Howtocomparethose?•  Howtoassesstheirperformance(efficiencyand

effectiveness)?•  Howto“push”systemsintobecomingbetter?

•  Benchmarkyoursystems!

InstanceMatchingBenchmarkComponents

•  Datasets–  Sourceandthetargetdatasetsthatwillbematchedtogethertofindtheentitiesthatrefertothesamerealworldobject

•  Groundtruth/Goldstandard/Referencealignment–  The“correctanswersheet”usedtojudgethecompletenessandsoundnessoftheresultsproducedbytheSUT

•  Organizedintotestcaseseachaddressingdifferentkindofinstancematchingrequirements

•  Metrics–  Theperformancemetric(s)thatdeterminethesystems’efficiencyandeffectiveness

•  Anovelinstancematchingbenchmarkgenerator

•  Domain-independent

•  Highlyconfigurableandscalable•  Standardvalue-basedandstructure-basedtestcases•  Advancedsemantics-awaretestcasesconsideringOWL2

expressiveconstructs

•  Richweightedgoldstandard

•  Additionalmetrics:similarityscoremetric

LANCEArchitecture

Source Data

Target Data

Weighted Gold Standard

Resource Transformation

Module

RESCAL [NT12]

MATCHER SAMPLER

Weight Computation Module

Test Case Generation Parameters RDF

Repository Dat

Initialization Module

Resource Generator

Test Case Generator SP

Matched Instances

Source Data

TestCases

Testcasesarebuiltusingavarietyoftransformations

•  Value-basedtestcases–  Transformationsofvaluesofdatatypeproperties

•  Structure-basedtestcases–  Transformationsofstructureofobjectanddatatypeproperties

•  Semantics-awaretestcases–  Transformationsattheinstancelevelconsideringtheschema

•  SimpleandComplexcombinationofthethreefirstcategories

LANCEPerformanceMetrics•  Averagesimilarityscore:averagedifficultyofthematchedinstances

–  Benchmarkwithhighaveragesimilarityscore:matchedinstancesareeasiertofind

•  Standarddeviation:spreadofsimilarityscoresforthematchedinstances–  Benchmarkwithhighstandarddeviation:

•  scoresarespreadoutfromtheaverage•  moreheterogeneityofmatchedinstances

10/31/16 HOBBITPlenary2

Obtainamorefine-grainedunderstandingoftheIMsystem’sperformancebycomparingtheaveragestandarddeviationand

similarityscoreofthesystemandbenchmark

Experiments•  EfficiencyandeffectivenessofIMsystemsusingLANCEbenchmarks–  Systems:•  LogMapVersion2.4[JG11](MoReReasoner[RG13])•  OtO[DP12]•  LIMES(EAGLEIMalgorithm[NL12])

–  Datasets•  LDBC’sSPIMBENCHGenerator(SemanticPublishingBenchmark)

•  UOBM– MatchingTask•  All5categoriesintroducedpreviously•  Allinstancesweretransformed

SPIMBENCH:StandardMetrics

•  LogMap–  Respondwellinthevalue-basedtestcases–  Reducedperformancewhenalsosemantics-awaretestcaseswereapplied

SPIMBENCH:StandardMetrics

•  OtOandEAGLE–  Givegoodresultsregardingthevalue-basedtransformations

–  Reducedperformanceintheremainingcategories•  EAGLEisnon-deterministicandusesunsupervisedlearning

UOBM:StandardMetrics

•  LogMap1.Doesnotperformwelltoanyofthecategories2.Performancenotaffectedbythedatasetsize•  OtO1.Performsbetter2.Reducedperformancewhenincreasingdatasetsize

SPIMBENCH:AdditionalMetrics

DistributionofsimilarityscoresforLANCEandTruePositivematchesfromIMsystemsforsemantics-awaretestcasesinthecaseofthe10Ktriplesdataset.•  LogMapcanaddressdifficulttestcases•  EAGLE&OtOcanaddressmostlyvalue-basedtestcases

0.7 0.72 0.74 0.76 0.78 0.8 0.82 0.84 0.86 0.88 0.9 0.92 0.94 0.96 0.98 1

log(#ofm

appings)

SimilarityScores

OtO EAGLE LogMap LANCE

StandardDevia8on

UOBM:AdditionalMetrics

DistributionofsimilarityscoresforLANCEandTruePositivematchesfromIMsystemsforstructure-basedtestcasesinthecaseofthe10Ktriplesdataset.•  LogMapcannotaddresswellthechangeofURIsintheInstances

ISWC2016:HowwelldoesyourInstanceMatchingsystemperform?ExperimentalevaluationwithLANCE 15

0.6 0.62 0.64 0.66 0.68 0.7 0.72 0.74 0.76 0.78 0.8 0.82 0.84 0.86 0.88 0.9

log(#ofm

appings)

SimilarityOtO LogMap LANCE

OtO LogMap LANCE

LessonsLearned•  DifferenttypeoftransformationsaffectIMsystem’s

performance•  Thecharacteristicsofsourcedatasetsaffectthebehaviorof

IMsystems

Questions?

AcknowledgmentsThisprojecthasreceivedfundingfromtheEuropeanUnion’sHorizon2020researchandinnovationprogrammeundergrantagreementNo688227.

References[JG11]E.Jimenez-RuizandB.C.Grau.Logmap:Logic-basedandscalableontologymatching.InISWC,2011.[RG13]A.A.Romero,B.C.Grau,etal.MORe:aModularOWLReasonerforOntologyClassification.InORE,pages61-67,2013.[DP12]E.DaskalakiandD.Plexousakis.OtOMatchingSystem:AMulti-strategyApproachtoInstanceMatching.InCAiSE,2012.[NL12]A.-C.NgongaNgomoandK.Lyko.EAGLE:EfficientActiveLearningofLinkSpecificationsusingGeneticProgramming.InESWC,2012.

how well does your instance matching system perform? experimental evaluation with lance

Science

obtaining precision when integrating...

a machine learning approach for instance matching based on

lance a lance edição 29

instance matching benchmarks in the era of linked...

1 berendt: knowledge and the web, 2014, berendt/teaching 1...

instance-based ontology matching by instance enrichment

ontology and instance matching for the linked …lim label...

spray gun lance combinations spray guns & lance …

efficient object instance search using fuzzy objects...

link discovery tutorial part iii: benchmarking for instance...

instance-based ontology matching for open and distance...

cs 580: algorithm design and analysis · matching) •...

instance-aware image and sentence matching with selective

an effective rule miner for instance matching in a web of...

iswc 2014 tutorial - instance matching benchmarks for linked...

large scale instance matching via multiple indexes and...

pattern matching & image registrationde nition: pattern or...

instance matching benchmarks for linked data - eswc 2016...

lance a lance edição 27

serimi: class-based disambiguation for effective instance...