driver guidelines and repository interoperability
DESCRIPTION
On 2008-11-15 Maurice Vanderfeesten gave a presentation in Baltimore at the SPARC OpenAccess confenrence. This presentation explains about the needs for interoperability amoung repository systems. DRIVER provides guidelines how to expose metadata via OAI-PMH is a way that has international compliance.TRANSCRIPT
Fasten … SeatbeltMaurice Vanderfeesten - SURFfoundation (NL)
15 November 2008 – Baltimore – DRIVER meeting
Fasten
Excel in Scholarly communicationExcel in Scholarly communication
Seatbelt
Get to the finish line safely
Innovation towards the intelligent web
4
Reasoning
Amount of data
Pro
duct
ivity
of S
earc
h
Databases
Web 1.0 1990 - 2000
PC Era1980 - 1990
The World Wide Web
The Desktop Keyword searchDirectories
2010 - 2020
2000 - 2010
2020 - 2030
Web 3.0
Web 4.0
Web 2.0 Natural language search
Tagging
Semantic SearchThe Semantic Web
The Intelligent Web
The Social Web
Files & Folders
By: Radar Networks / TWINE
Work together:Respect some rules
Global Digital Repository Infrastructure
Global Digital Repository Infrastructure
One goal: “Reliable Content Provision” One goal: “Reliable Content Provision”
TICER 2008, Tilburg
Wide spread metadata standards: Unqualified Dublin Core & OAI-PMH
Problem: interpreting semantics; standard specifications not enough
Example: Electronic theses need context specific descriptions for date, type, roles & language
7
Reality: Efforts to interpret and normalize data
TICER 2008, Tilburg
- Trouble automatically interpreting semanticsex. [date] (Cranfield)
8
Effort interpreting dates
<dc:contributor>Partington, David(supervisor)</dc:contributor><dc:creator>Lupson, Jonathan</dc:creator><dc:date>2007-06-06T18:17:13Z</dc:date><dc:date>2007-06-06T18:17:13Z</dc:date><dc:date>2007-02</dc:date><dc:identifier>http://hdl.handle.net/1826/1729</dc:identifier><dc:description>
(Publication?)(Graduation?)(Start ?)
Humboldt:<dc:date>2007-06-07</dc:date> (Graduation) <dc:date>2007-03-06</dc:date> (Publication) <dc:date>2003-02</dc:date> (Start)
Humboldt:<dc:date>2007-06-07</dc:date> (Graduation)<dc:date>2007-03-06</dc:date> (Publication)<dc:date>2003-02</dc:date> (Start)
Recommendation: in Unqualified Dublin Core use one date field that represents the Publication date!
TICER 2008, Tilburg
- Trouble automatically interpreting semanticsex. [type]
9
Effort interpreting types
DIVA:<dc:type>text.thesis.doctoral</dc:type>
Cranfield:<dc:type>Thesis or dissertation</dc:type><dc:type>Doctoral</dc:type><dc:type>PhD</dc:type>
Humboldt:<dc:type>Text</dc:type><dc:type>dissertation</dc:type>
Recommendation: use the following qualifications:“info:eu-repo/semantics/bachelorThesis”, “info:eu-repo/semantics/masterThesis”, “info:eu-repo/semantics/doctoralThesis” (Bologna Convention)
TICER 2008, Tilburg
1. Electronic theses need context specific descriptions
10
Effort interpreting roles
<dc:contributor>Partington, David(supervisor)</dc:contributor><dc:creator>Lupson, Jonathan</dc:creator><dc:date>2007-06-06T18:17:13Z</dc:date><dc:date>2007-06-06T18:17:13Z</dc:date><dc:date>2007-02</dc:date><dc:identifier>http://hdl.handle.net/1826/1729</dc:identifier><dc:description>
Recommendation: use the contributor field in Dublin Core only for the person who supervised the Doctoral thesis project.
TICER 2008, Tilburg
Personal notation flavour of a language
11
Effort interpreting languages
<dc:language>Nederlands</dc:language>
<dc:language>ned</dc:language>
<dc:language>nl</dc:language>
<dc:language>nld/dut</dc:language>
<dc:language>en_UK</dc:language>
<dc:language>mn</dc:language>
Recommendation: use ISO639-3As a standard way of writing down a language in a repository
Number of repositories increase
DRIVER: Collection of
Quality Metadata for OpenAccess
Material
DRIVER: Collection of
Quality Metadata for OpenAccess
Material
All services providers must build adaptors for every single repository
Interoperability shares workload
Global Digital Repository Infrastructure
Global Digital Repository Infrastructure
One goal: “Reliable Content Provision” One goal: “Reliable Content Provision”
RepositoryRepository
URLURL
Reliability: Broken Links Issue
RepositoryRepositoryGlobal
Resolver
GlobalResolver
OAI-PMHOAI-PMHIDID
URLURLID + URLUpdates
ID + URLUpdates
Reliability: Link resolvers
• Use ID’s for citation reference• Obligation to update• Technology independent (future proof)
Standards, Agreements, Rules: Interoperability guidelines
Towards web-reasoning: data efficiency & interoperability levels
By: Andreas Tolk et al., "Composable M&S Web Services for Net-centric Applications," Journal for Defense Modeling & Simulation (JDMS),
Volume 3 Number 1, pp. 27-44, January 2006
Interoperability leads towards improved retrieval and recall
Reasoning
Amount of data
Pro
duct
ivity
of S
earc
h
Databases
Web 1.0 1990 - 2000
PC Era1980 - 1990
The World Wide Web
The Desktop Keyword searchDirectories
2010 - 2020
2000 - 2010
2020 - 2030
Web 3.0
Web 4.0
Web 2.0 Natural language search
Tagging
Semantic SearchThe Semantic Web
The Intelligent Web
The Social Web
Files & Folders
By: Radar Networks / TWINE
We have: Tools for Syntactic & Semantic Interoperability
- Guidelines for content providers,
exposing textual resources with OAI-PMH
- Validator,
checking the rate of compliance to the
“Guidelines for content providers”
21
Guidelines 2.0- Build on knowledge from past & current IR projects (EU)
- 26 actively involved contributors (experts and repository
managers) from 8 countries.
- Practical answers for IR’s on how to:
- Improve full-text access
- Standardize metadata quality
- Create a reliable infrastructure for permanent identification,
resolution, traceability and storage
- Resolve semantic and classification issues
Guidelines 2.0 - Chapters
1. Use of OAI-PMH
2. Use of Metadata OAI_DC
3. Use of Best Practices for OAI_DC
4. Use of Compound Object Wrapping
5. Use of Vocabularies and Semantics
6. Use of Quality labels (Long Term Preservation)
7. Use of Persistent Identifiers
8. Use of Usage Statistics Exchange
9. Use of Intellectual Property Rights (IPR)
Validator
Validator
- Deep validation
- Experimental tool
- Self-test for Repository
Managers
- Embedded in DRIVER
registration process
- Detects interoperability issues
- Provides explanation per
interoperability issue.
- Points to exact location of the
issue for easy debugging
- Offers recommendations on how
to correctly modify your
repository to interoperable
standards
- Creates a report for future
reference
- Provides a weighted score for
balanced effort
- Score influences the result list.
Looking back on what we have:
- Guidelines for content providers,
exposing textual resources with OAI-PMH
- Validator,
checking the rate of compliance to the
“Guidelines for content providers”
27
What is missing?
28
Guidelines
Trias Politica Model
29
Legislative
Reasoning
Amount of data
Pro
du
ctiv
ity o
f S
ea
rch
Databases
Web 1.0 1990 - 2000
PC Era1980 - 1990
The World Wide Web
The Desktop Keyword search
Directories
2010 - 2020
2000 - 2010
2020 - 2030
Web 3.0
Web 4.0
Web 2.0 Natural language search
Tagging
Semantic Search
The Semantic Web
The Intelligent Web
The Social Web
Files & Folders
By: Radar Networks / TWINE
We DON’T have- A structure for acceptance of
Repository Interoperability Guidelines World Wide
- Executive enforcement enabling action on adopting Interoperability Guidelines for Repositories, World Wide, on a National and local level
30
Questions• What strategies can be used to create a global “Trias
Politica” for repositories in order to enforce “reliable content provision” by using interoperability guidelines?
• What strategies are there to maintain repository guidelines? Who is responsible?
• What strategies are known to create an acceptance mechanism for global agreement to repository guidelines?
• What strategies can be used to enforce repository guidelines?
• Who is responsible for the (metadata) quality of the repository output?