a semantic web view on concepts and their alignments antoine isaac vrije universiteit amsterdam...

31
A Semantic Web View on Concepts and their Alignments Antoine Isaac Vrije Universiteit Amsterdam Europeana Concepts in Context, Köln, July 19 th 2010

Upload: alexis-dock

Post on 14-Dec-2015

217 views

Category:

Documents


0 download

TRANSCRIPT

A Semantic Web View on Concepts and their Alignments

Antoine Isaac

Vrije Universiteit AmsterdamEuropeana

Concepts in Context, Köln, July 19th 2010

Linked Data Principles

1. Use URIs as names for things2. Use HTTP URIs so that people can look up those names3. When someone looks up a URI, provide useful information

using standards (RDF, SPARQL)4. Include links to other URIs, so that they can discover more

things.Tim Berners-Lee, http://linkeddata.org/

A way to publish Semantic Web data

A web of data

• Publish and re-use data via the web, building innovative applications over former data silos

• Principle #4 is crucial to this vision:Include links to other URIs, so that they can discover more things.

http://linkeddata.org/

SKOS, Knowledge Organization Systems and Linked Data

SKOS allows representing (simple) KOS data as RDFanimals

NT catscats

UF domestic catsRT wildcatsBT animalsSN used only for domestic cats

domestic catsUSE cats

wildcats

SKOS, KOSs and LDSKOS allows bridging across KOSs from different contexts

http://www.w3.org/2004/02/skos/

Some landmark KOS LD implementations• Many Libraries – not a surprise!

• Swedish National Library’s Libris catalogue and thesaurus http://libris.kb.se/ • Library of Congress’ vocabularies, including LCSH http://id.loc.gov/ • DNB’s Gemeinsame Normdatei (incl. SWD subject headings) http://d-nb.info/gnd/

Documentation at https://wiki.d-nb.de/display/LDS

• BnF’s RAMEAU subject headings http://stitch.cs.vu.nl/ • OCLC’s DDC classification http://dewey.info/ and VIAF http://viaf.org/ • STW economy thesaurus http://zbw.eu/stw • National Library of Hungary’s catalogue and thesauri http://oszkdk.oszk.hu/resource/DRJ/404

(example)

• Other fields• Wikipedia categories through Dbpedia http://dbpedia.org/ • New York Times subject headings http://data.nytimes.com/ • IVOA astronomy vocabularies http://www.ivoa.net/Documents/latest/Vocabularies.html• GEMET environmental thesaurus http://eionet.europa.eu/gemet • UMTHES• Agrovoc http://aims.fao.org/ • Linked Life Data http://linkedlifedata.com/ • Taxonconcept http://www.taxonconcept.org/ • UK Public sector vocabularies http://standards.esd.org.uk/ (e.g., http://id.esd.org.uk/lifeEvent/7 )

KOS Alignments?

Quite many of them are linked to some other resource• LCSH, SWD and RAMEAU interlinked through MACS mappings• GND linked to DBpedia and VIAF• Libris linked to LCSH• Agrovoc to CAT, NAL, SWD, GEMET• NYT to freebase, DBpedia, Geonames• dbPedia links are overwhelming

Hungary, STW, TaxonConcept, GND…

Is that enough? Are these links any good?

[Cyganiak, Jentzsch] http://linkeddata.org/

Sparse linkage: the LD cloud

[Guéret, 2010] http://blog.larkc.eu/?p=1941

Sparse of linkage: another view

Linked Data Issues

Mike Uschold’s “semantic elephants”• Proliferation of URIs, Managing Coreference• Versioning and URIs• Overloading owl:sameAs

http://lists.w3.org/Archives/Public/public-lod/2010May/0012.html

What kind of links?

Coreference links are the most used (and needed)• owl:sameAs• skos:exactMatch• skos:closeMatch• rdfs:seeAlso• umbel:isLike

Overloading owl:sameAs

• Formally, two URIs linked by owl:sameAs are inferred to have the same propertiesex:a name “Antoine Isaac” .ex:b owl:sameAs ex:a .Implies ex:b name “Antoine Isaac” .

• Many owl:sameAs statements are asserted between resources that are only very similar [Halpin 2009]A same resource but in different contexts, a reference…

Case study: New York Times

• 10K concepts (places, descriptors, persons, organizations)http://data.nytimes.com

• Manually or automatically mapped by NYT staff to dbPedia, freebase, geonamesLinking LD cloud to NYT articles!Allows to easily mix NYT content with other content

• Started with quite messy modeling http://data.nytimes.com/60694995023816375851

dcterms:rightsHolder The New York Times Company .http://data.nytimes.com/60694995023816375851

owl:sameAs http://dbpedia.org/resource/Park_Slope%2C_Brooklyn .

Clearer KOS alignments (1)

What is being aligned?Concepts, documents, real-world entities “out there”

(persons, places…)

• In principle owl:sameAs should not be applied across disjoint categories

• But even for one category there can be issues• Two KOS concepts representing a same notion but with different

management metadata attached (skos:changeNote)

Clearer KOS alignments (2)

How is it aligned? Distinguish:• exact co-reference• conceptual similarity, including equivalence • classification

• Making clearer distinctions between conceptual links• skos:narrowMatch, skos:broadMatch, skos:relatedMatch

• Minimize ontological commitment for KOS data consumers• skos:exactMatch: concepts can be used interchangeably across a wide range of

information retrieval applications. skos:exactMatch is a transitive property• skos:closeMatch: In order to avoid the possibility of "compound errors" when

combining mappings across more than two concept schemes, skos:closeMatch is not declared to be a transitive property

Case study: New York Times (2)

Data quality has considerably improved• Factual data is at the concept itself, management data is at the resource

representing the data source (context)

http://data.nytimes.com/60694995023816375851 rdf:type skos:Concept ;skos:prefLabel “Park Slope (NYC)” ;geo:lat “40.6701033” ;owl:sameAs http://dbpedia.org/resource/Park_Slope%2C_Brooklyn .

http://data.nytimes.com/60694995023816375851.rdfdcterms:rightsHolder “The New York Times Company” ;foaf:primaryTopic http://data.nytimes.com/60694995023816375851

• Still, for resources linked with owl:sameAs statements representing different modeling choices can be merged

the DBpedia resource might not be a skos:Concept, or use different latitude format

Clearer KOS alignments (3)

What is the alignment for?• SKOS mapping properties use the notion of validity within one

application context• Application context for mapping has been investigated in

thesaurus interoperability studies• Application of alignments matters:

• STITCH application scenarios for Cultural Heritage: book re-indexing, thesaurus merging, query reformulation…

• A same alignment performs differently for different scenarios[Isaac 2008, Wang 2009]

Application-specific alignment evaluation

Example: OAEI 2007 campaign, 3 matching tools evaluated for thesaurus merging & book re-indexing

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Precision Coverage

Falcon

Silas

DSSim

0%

10%

20%

30%

40%

50%

60%

70%

80%

90%

100%

Pa Ra

Falcon

Silas

DSSim

Application-specific alignments

Why?

Take 2 thesauri at the Nat. Library of the Netherlands: GTT and Brinkman

• For thesaurus merging, gtt:excavation should be aligned to brinkman:excavation

• For book re-indexing, gtt:excavation should be aligned to brinkman:archeology_netherlands

• Requires a finer representation grain for the context in which the alignment is produced• Who created it?• Manual vs. Automatic?• Which alignment strategy or tool?• Is there a degree of confidence?

Case study: New York Times (3)

• Using nyt:mapping_strategy property with nyt:manual or nyt:automatic:

http://data.nytimes.com/60694995023816375851.rdfnyt:mapping_strategy http://data.nytimes.com/elements/manual .

• Problem: it applies to the context file for the concept, not to the statement itself:

http://data.nytimes.com/60694995023816375851 owl:sameAs http://dbpedia.org/resource/Park_Slope%2C_Brooklyn .

• Using simple binary properties (skos:exactMatch…) between aligned resources does not allow for much flexibility

Ontology Matching community practices

• Community investigating the ontology and vocabulary matching issuesOntology Alignment Evaluation Initiative

http://oaei.ontologymatching.org

• Matching tools produce some metadata• Metadata repositories store and manage them

– Bioportal http://bioportal.bioontology.org/ – CATCH vocabulary and alignment repository

http://stitch.cs.vu.nl/repository/ …

• Consensus: richer alignment metadata is needed

From a simple representation

to a more complete one

http://alignapi.gforge.inria.fr/edoal.html

Can LD accommodate complex representations?

• The strength of the LD vision lies in the relative simplicity of a standard representation

• LD provides a simple way to publish data and follow one’s nose to connected dataSerendipity!

• Reification and metadata on links are not really compatible with itHigher barrier for data publication and consumption

Peaceful co-existence

• Applications with narrow scope and that require precise data can afford• Selecting alignments they consume• Exploiting finer-grained representations• Creating finer-grained representations

• Simple data for applications that are simple and/or exploiting a wide range of datasets• Simple mesh-up applications robust to (limited) approximation• Web-scale applications

Large-scale document retrieval, Concept discovery

Does it need to be perfect anyway?

• Do we really want to throw away crucial URI co-reference data?http://sameAs.org has 35,187,488 URIs in 11,285,263 bundles

• Extensive linking to dbPedia is useful, even with a type of link which is not used in the theoretically good wayCf. BBC content and data mesh-ups

http://www.bbc.co.uk/wildlifefinder/ http://www.bbc.co.uk/music/

• Issues with mixed quality are being tackled– http://sameAs.org as a “service to provide you with help finding URIs”,

keeping track of data sources– Representation and exchange of provenance info is under active

investigation

Peaceful co-existence (2)

• If you have complex representation, don’t be pedantic and publish simpler data, too!

• Articulation between LD (to discover links) and alignment repositories is needed

• Technically feasible, best practices have to be identified

Conclusions

• (Almost) any alignment is better than none This is a web of data, without links there’s almost no value

• There is already great linking happening!

• More involvement from this community would certainly help!Alignment themselves & Theoretical foundations

Thanks!

Possible participation channels:Linked Open Data community (http://linkeddata.org) and

mailing list ([email protected])Library Linked Data W3C incubator group

(http://www.w3.org/2005/Incubator/lld/wiki/ ) and community list ([email protected])

References

• [Halpin 2009] Harry Halpin, Pat Hayes. When owl:sameAs isn't the Same: An Analysis of Identity Links on the Semantic Web. LDOW 2009

• [Isaac, 2008] Antoine Isaac, Henk Matthezing, Lourens van der Meij, Stefan Schlobach, Shenghui Wang, Claus Zinn. Putting ontology alignment in context: usage scenarios, deployment and evaluation in a library case. ESWC 2008

• [Wang, 2009] Shenghui Wang, Antoine Isaac, Balthasar Schopman, Stefan Schlobach, Lourens van der Meij. Matching multi-lingual subject vocabularies. ECDL 2009