avoiding a semantic web roadblock: uri management and ontology evolution
DESCRIPTION
We highlight the importance of creating a set of guidelines for managing URIs during ontology evolution and linking open data. We examine some potential and actual negative impacts of making the wrong decision. For example, the new version of SKOS changes the semantics for existing terms without changing the URI. This adds a heavy load on developers of ontology-driven applications to keep them from breaking. Alternatively, minting a whole set of new URIs when the meaning for most of the terms is unchanged, causes an unnecessary proliferation of URIs that adds computational and conceptual overheads. We suggest a way forward based on examining two root causes of the problem: 1) URIs are overloaded and 2) there is no good technology for change management. As linked data grows and as applications are driven more and more by ontologies, the negative impacts of inadequate URI management could severely retard the growth of the semantic web.TRANSCRIPT
Copyright © 2010 Michael Uschold. All rights reserved.
Avoiding a Semantic Web Road Block:
URI Management and Ontology Evolution
Michael Uschold, PhD:Independent Consultant
.Friday 25 June 2010
Semantic Technology Conference , San Francisco, CA
1
Engineering, Operations & Technology | Phantom Works E&IT | Mathematics and Computing Technology
Page 2Copyright © 2010 Michael Uschold. All rights reserved.
Outline
• Examples of linked data in the wild
• Problems
• Root Causes
• What to do?
Engineering, Operations & Technology | Phantom Works E&IT | Mathematics and Computing Technology
Ontologies and Linked Data in the Wild: SKOS
Simple Knowledge Organizing System (SKOS)
• Small vocabulary (20 terms)
• Evolve to new version
Changes:
• Majority of terms are the same
• Change semantics of broader: no longer transitive
3Copyright © 2010 Michael Uschold. All rights reserved.
Engineering, Operations & Technology | Phantom Works E&IT | Mathematics and Computing Technology
Ontologies and Linked Data in the Wild: WordNet
WordNet: lexical database for English language
• Large vocabulary
• Evolve to new version
Changes:
• Majority of terms are the same
• Significant number of updates and changes
4Copyright © 2010 Michael Uschold. All rights reserved.
Engineering, Operations & Technology | Phantom Works E&IT | Mathematics and Computing Technology
Ontologies and Linked Data in the Wild: Open Biomedical Ontologies
Open Biomedical Ontologies
• Very large vocabulary
• Interconnected ontologies
• Undergoing continual evolution (daily)
Changes:
• Majority of terms are the same
• Significant number of updates and changes
5Copyright © 2010 Michael Uschold. All rights reserved.
Engineering, Operations & Technology | Phantom Works E&IT | Mathematics and Computing Technology
Versioning and URIs: Options
A. Mint all new URIs, even for unchanged terms.
B. Keep URIs the same, even when semantics changes.
C. Mint new URIs only for changed terms.
(including the ontology URI)a. Throw away old terms.
b. Deprecate old terms for backwards compatibility
6Copyright © 2010 Michael Uschold. All rights reserved.
Engineering, Operations & Technology | Phantom Works E&IT | Mathematics and Computing Technology
(A) Mint all new URIs: Impacts
Usage Scenario
1. Application loads ontology O1 and data D1
2. New version: O21. All new URIs,
2. No idea which terms have different semantics
3. New dataset D2, created and loaded into application
4. Query using old URIs
WRONG ANSWERS: Ignores data from new URIs
Maintenance headaches: find semantic matches
Performance problems: if use owl:sameAs
Broken applications
Convenient for first time users.
7Copyright © 2010 Michael Uschold. All rights reserved.
Engineering, Operations & Technology | Phantom Works E&IT | Mathematics and Computing Technology
(B) Same URIs, Different Semantics: Impacts
Usage Scenario
1. Application loads ontology O1 and data D1
2. Create application functionality that depends on O1
3. New version: O21. Some terms now have different semantics, but the same URIs,
2. No idea which terms have different semantics
4. New dataset D2, created and loaded into application
5. Run functionality that depends on O1 semantics
WRONG ANSWERS: mixing different semantics
Maintenance Headaches: find semantic matches
Broken Applications
Convenient for first time users.
8Copyright © 2010 Michael Uschold. All rights reserved.
Engineering, Operations & Technology | Phantom Works E&IT | Mathematics and Computing Technology
(C) New URIs only for changed terms: Impacts
Usage Scenarios
1. No broken applications
2. No performance problems
3. No maintenance headaches
Inconvenience of having same ontology with multiple
namespaces.
9Copyright © 2010 Michael Uschold. All rights reserved.
Engineering, Operations & Technology | Phantom Works E&IT | Mathematics and Computing Technology
Pros and Cons
Maintenance
headaches
Performance
problems
Broken
Apps
Multiple
namespaces
same ontology
Convenient
for first time
users.
A: All New
URIsx x x x
B: Same URIs
changed
semantics
x x x
C: New URIs
only for new or
changed terms
x
10Copyright © 2010 Michael Uschold. All rights reserved.
What would YOU do?
What did THEY do?
WordNet, SKOS, Open Biomedical Ontologies
Engineering, Operations & Technology | Phantom Works E&IT | Mathematics and Computing Technology
What Actually Happened?
Open Biomedical Ontologies: (C)
New URIs only for new terms, deprecate old terms
SKOS: (B) Same URIs, Different Semantics
WordNet: (A) Mint all new URIs, multiple times!http://wordnet.princeton.edu/~agraves/wordnet/0.9/
http://xmlns.com/wordnet/1.6/
http://www.w3.org/2006/03/wn/wn20/instances/
http://www.loa-cnr.it/wn30/instances/
But wait, there’s more:
http://purl.org/vocabularies/princeton/wn30/
http://www.ontologyportal.org/WordNet.owl#WN30-200662589
11Copyright © 2010 Michael Uschold. All rights reserved.
Engineering, Operations & Technology | Phantom Works E&IT | Mathematics and Computing Technology
Why no Uproar?
• SKOS is not a standard
• SKOS is not used by that many people
• It’s just life, people get by
• Few ontology-driven applications
• BUT: this is changing, and business as usual could
result in a Semantic Web Roadblock down the road.
12Copyright © 2010 Michael Uschold. All rights reserved.
Engineering, Operations & Technology | Phantom Works E&IT | Mathematics and Computing Technology
Another Example: DBpedia and Yago
• DBpedia published, without any ontology
• YAGO team created ontology from DBpedia• Subset of Wikipedia category hierarchy
• Only when aligned with WordNet hierarchy
• http://www.mpii.de/yago/resource/wordnet_calculator_102938886
• DBpedia team added Yago Classes to their datasets,
but different URIs were used.• http://dbpedia.org/class/yago/Calculator102938886
ISSUES:
• Proliferation of URIs.
• A lot of semantics hidden in names.
13Copyright © 2010 Michael Uschold. All rights reserved.
Engineering, Operations & Technology | Phantom Works E&IT | Mathematics and Computing Technology
Problems and Root Causes
14
Ontology-driven
applications break
Maintenance
Issues
Performance
Issues
URIs Overloaded
(especially w/ UIDs)
Engineering, Operations & Technology | Phantom Works E&IT | Mathematics and Computing Technology
URI Overloading
http://wordnet.princeton.edu/~agraves/wordnet/0.9/
1. Owning / Controlling organization
2. File directory structure
3. Human readable names (ontology and terms)
4. Version number
5. Unique Identifier
6. Web location (URL)
Contributed to SKOS problem. If URIs were only UIDs:• Non-transitive broader: Create a new resource with new UID
• Transitive broader: change the human readable term name to
broaderTransitive, same UID.
• Viola!15Copyright © 2010 Michael Uschold. All rights reserved.
Engineering, Operations & Technology | Phantom Works E&IT | Mathematics and Computing Technology
Problems and Root Causes
16
Ontology-driven
applications break
Maintenance
Issues
Performance
Issues
Overuse of
OWL:sameAs
Proliferation of URIs
URIs Overloaded
(especially w/ UIDs)
Poor change mgmt.
infrastructure
Engineering, Operations & Technology | Phantom Works E&IT | Mathematics and Computing Technology
Change Management Infrastructure
• Inadequate to non-existent• Stopgap: annotation properties for versioning
• Technologies immature
• Purposely delayed by W3C
HENCE: no versioning guidelines
17Copyright © 2010 Michael Uschold. All rights reserved.
Engineering, Operations & Technology | Phantom Works E&IT | Mathematics and Computing Technology
Problems and Root Causes
18
Ontology-driven
applications break
Maintenance
Issues
Performance
Issues
Overuse of
OWL:sameAs
Proliferation of URIs
URIs Overloaded
(especially w/ UIDs)
Poor change mgmt.
infrastructure
No versioning
guidelines
Change semantics
of URIs
Semantic infidelity
Overloading
OWL:sameAs
Engineering, Operations & Technology | Phantom Works E&IT | Mathematics and Computing Technology
What can be done?
1. Imagine a future:
• Change management and versioning is solved.
• Specify exactly WHAT that would mean
(Don’t worry about HOW)
• Ontology-driven applications are the norm.
2. Build guidelines that will work in this future.
19Copyright © 2010 Michael Uschold. All rights reserved.
Engineering, Operations & Technology | Phantom Works E&IT | Mathematics and Computing Technology
Change Management & Versioning Solved
• Unique IDs are separated from URLs and all the rest.
• Automatic tracking and detection of dependencies
• Automatic minting of new UIDs when semantics
changes• Don’t change name if semantics is the same
• Don’t change semantics if name is the same
20Copyright © 2010 Michael Uschold. All rights reserved.
Engineering, Operations & Technology | Phantom Works E&IT | Mathematics and Computing Technology
SUMMARY: Problems and Root Causes
21
Proliferation of URIs
Overuse of
OWL:sameAs
Ontology-driven
applications break
Maintenance
Issues
Performance
Issues
Poor change mgmt.
infrastructure
URIs Overloaded
(especially w/ UIDs)
Change semantics
of URIs
No versioning
guidelines
Overloading
OWL:sameAs
Semantic infidelity