provenance of scientific information as experienced in driver 6th e-infrastructure concertation...
Post on 20-Dec-2015
214 views
TRANSCRIPT
Provenance of scientific information
as experienced in DRIVER
6th e-Infrastructure Concertation Event
Lyon, 24th November 2008
Wolfram HorstmannBielefeld University / DRIVER
Notions of Provenance
• Where do data objects* originate from? – Scientific Work -- examples
• Instrumentation techniques– Manufacturers of hard- and software
• Methodologies– Processes, e.g. gene sequencing
– Technical/Local -- examples
• (web)-identifiers• Database, repository name
* Primary data, documents, metadata …
Why Provenance?
• Quoting / Citing / Referencing as global scientific principle – „Reproducible research“
• Giving credits to authors / creators in distributed environments
• Original location / context has to be known
• Experienced in Grid-Environments [1]
Provenance & Interoperability
• Re-Use / Sharing: “Addressing/Accessing”– Common view, common use– Unidirectional: No change of data objects!
• Federation: “Discovering in Context”– Remote representation of distributed DOs
• Aggregation: “Contextualizing”– Add unchanged object in a context
• Processing/Annotation: “Changing”– Uni- vs. Bidirectional: Change of DOs and remote
representation vs. back-storage (e.g. CVS)
Basic Provenance Settings
• Indicate Production Situation– Metadata
• Author, Instrumentation etc.
• Remote Representation– Indicate place of origin in remote systems
• Metadata as digital objects / first order citizens
– Allow lineage respresentation • Credits in remote environments / versioning
Orders of Provenance
• 1st order: Metadata– Provenance attached to data– Minimal „knowledge“ required in application– Allow remote handling of data objects– Require metadata infrastructure– Metadata introduce 2 objects: requires linkage
• 2nd order: context / compounds– Express multiple relations between objects– May introduce semantic model
Provenance in DRIVER #1
• Simple Objects: OAI-PMH [2]
– 1st order provenance • Metadata: minimum OAI-DC
– 2nd order provenance• DRIVER explicit identifiers for repositories• OAI-PMH: inline representation („about“)
Provenance in DRIVER #2
• „Enhanced Publications“ – Research project in
DRIVER-II– Representation of
data /document packages
– Use of OAI-ORE
Provenance in OAI-ORE
• OAI-ORE: Object Re-Use and Exchange[4] – Uses Resource Maps < Named Graphs– Uses „lineage“ to represent expl. Provenance– Future: explicit provenance model [7] ?
Summary
• Provenance essential for …– Indicating origin in distributed data spaces
• Accessing / Addressing• Federation / Aggregation • Processing / Annotation
– Document and data citation / trace-back– 1st order: describing data > metadata– 2nd order: describing context > semantic data
Lessons learnt in DRIVER
• Use web-enabled Identification (URI/UDDI etc.)– „Dark“ databases don‘t interoperate
• 1st order provenance at place of origin– Requires metadata to describe origin– Enables a metadata infrastructure– Introduces linkage problem
• 2nd order provenance in contexts– Requires data provider identification in federators /
aggregators in order to link back– May require semantic model for context– Would benefit from a semantic infrastructure
Resources[1] On provenance in the eScience / grid-environment
– http://www.sigmod.org/sigmod/record/issues/0509/p31-special-sw-section-5.pdf – In GLITE
• http://www.cesnet.cz/doc/techzpravy/2007/glite-job-provenance/• http://twiki.ipaw.info/bin/view/Challenge
[2] On provenance in OAI-PMH– http://www.openarchives.org/OAI/2.0/guidelines-provenance.htm
[3] On provenance OAI-ORE (referred to as ore:lineage)– http://www.openarchives.org/ore/meetings/Soton/ore_beyond_basics.pdf
(general)– http://www.openarchives.org/ore/1.0/vocabulary (definition)
[4] Named Graphs, Provenance and Trust (Caroll et al. )– http://www4.wiwiss.fu-berlin.de/bizer/SWTSGuide/carroll-ISWC2004.pdf
[5] W3C: On provenance in RDF– http://www.w3.org/2001/12/attributions/
[6] Open Provenance Model– http://eprints.ecs.soton.ac.uk/14979/1/opm.pdf
[7] DRIVER: Digital Repository Infrastructure for European Research– http://www.driver-community.eu