is preserving data enough? towards the preservation of scientific methods
TRANSCRIPT
Daniel Garijo, Oscar Corcho, Khalid Belhajjame, Lourdes Verdes-Montenegro, Julián Garrido, Raúl
Palma, Cezary Mazurek and Kristina Hettne
Ontology Engineering Group (Universidad Politécnica de Madrid)
University Paris-DauphineAMIGA (Instituto de Astrofísica de Andalucía)
Poznan Supercomputing and Networking Center
LUMC
Warsaw, May 28th 2015
Is preserving data enough? Towards the preservation of
scientific methods
Where does data come from? Scientific workflows
2Is preserving data enough? Towards the preservation of scientific methods
Benefits:• Sharing and reusing previous work• Time savings: reexecution of old experiments with different parameters).• Teaching: new students can learn existing methods in the lab• Design for modularity, so others can reuse • Design for standardization, reduction of heterogeneity• Debugging of executions• Paper writing, linking execution pipelines to publications.• Reproducibility.• Etc.
Lab book
Digital Log
Workflow
Experiment
How do we preserve workflows?
3
Workflow repositories are great! But:• Manual annotation and documentation• Workflow conservation plan?• No clear link between data and method• How to reproduce a workflow?
Workflows keep breaking!• Zhao et al: Why Workflows Break - Understanding and Combating Decay in
Taverna Workflows. >90 workflows analyzed• Third party resources not available/accessible• Missing example data• Lack of documentation• Incomplete metadata.
Is preserving data enough? Towards the preservation of scientific methods
Do I have to document everything again?? Didn’t I just write a paper?
Our solution: Data + method =Context - Research Object
4
Aggregation of resources that bundles together the contents of a research work
Is preserving data enough? Towards the preservation of scientific methods
OAI-ORE
+ +PROV
OA
How to preserve Research Objects?
5Is preserving data enough? Towards the preservation of scientific methods
Three main ways/levels:• Descriptive reproducibility
• Documentation• Workflow execution reproducibility
• Can we run the workflow?• Workflow results reproducibility
• Can we get the same results?
Checklists!• Corcho et al: Checklist for workflow conservation.
• http://dx.doi.org/10.6084/m9.figshare.1285011• 40 different aspects
• Documentation• Goals• Results• Metadata• ….
• Corcho et al: Checklist for a workflow conservation plan• http://dx.doi.org/10.6084/m9.figshare.1285012• Based on the DCC’s data management plan
Some examples
6Is preserving data enough? Towards the preservation of scientific methods
Levels of reproducibility
Workflow conservation Plan
Conclusions
7Is preserving data enough? Towards the preservation of scientific methods
• Research Objects help bundling and bridging the gap between data and methods (scientific workflows)
• We need to preserve research objects as much as data and workflowsused to obtain it!• Documentation
• Ability to execute the experiment
• Ability to obtain the same results
• Checklists are a first step towards improving documentation, archival and preservation research objects.
http://www.researchobject.org/
Daniel Garijo, Oscar Corcho, Khalid Belhajjame, Lourdes Verdes-Montenegro, Julián Garrido, Raúl
Palma, Cezary Mazurek and Kristina Hettne
Ontology Engineering Group (Universidad Politécnica de Madrid)
University Paris-DauphineAMIGA (Instituto de Astrofísica de Andalucía)
Poznan Supercomputing and Networking Center
LUMC
Warsaw, May 28th 2015
Is preserving data enough? Towards the preservation of
scientific methods