converting scripts into reproducible workflow research objects
TRANSCRIPT
![Page 1: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/1.jpg)
Converting Scripts into Reproducible Workflow Research Objects
Lucas A. M. C. Carvalho, Khalid Belhajjame, Claudia Bauzer Medeiros
Baltimore, Maryland, USA
October 23-26, 2016
![Page 2: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/2.jpg)
2
Background and Motivation
● Data-Intensive Experiments– Collection of scripts, programs and (big) data
Papers
![Page 3: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/3.jpg)
3
Background and Motivation
● Data-Intensive Experiments– Collection of scripts, programs and (big) data
Papers
How to understand, reproduce or reuse data and models of experiments?
![Page 4: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/4.jpg)
4
Background and Motivation
● Data-Intensive Experiments– Collection of scripts, programs and (big) data
Manual collection and organization of data provenance
Papers
How to understand, reproduce or reuse data and models of experiments?
![Page 5: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/5.jpg)
5
Background and Motivation
● Script-based experiments
What are the inputs and outputs?
How to change this local program for a similar web service?
Example of script code.
Difficult to understand, to reuse, and to reproduce.
![Page 6: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/6.jpg)
6
Background and Motivation
● Scientific Workflows
Example of Scientific Workflow Management System.
![Page 7: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/7.jpg)
7
CreateUnderstandReuseReproduce
Overview
![Page 8: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/8.jpg)
8
CreateUnderstandReuseReproduce
Overview
+
![Page 9: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/9.jpg)
9
CreateUnderstandReuseReproduce
Overview
+
Step 2
Step 1
Step 3
Step 4
Step 5
Methodology
![Page 10: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/10.jpg)
10
Related Work
● Script-language specific.● Workflow-engine specific.● A new language is needed.● Outcome is not an executable workflow.● Do not collect provenance data of the
conversion process.
![Page 11: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/11.jpg)
11
Two Kind of Experts
● Scientists– Domain experts who understand the experiment, and
the script (sometimes called user);
● Curators: – Scientists who are also familiar with workflow and
script programming or;
– Computer scientists who are familiar enough with the domain to be able to implement our methodology;
– Responsible for authoring, documenting and publishing workflows and associated resources.
![Page 12: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/12.jpg)
12
Requirements
● Produce workflow-like view of the script.● Create an executable workflow and compare
execution of workflow and script.● Modify the workflow resources.● Record provenance data.● Aggregate all resources to support
Reproducibility and Reuse.
1
2
3
4
5
![Page 13: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/13.jpg)
13
Requirements
● Produce workflow-like view of the script.1
Activity 1
Port 1 Port 2 Port 3
Port 1 Port 2
Activity 2
Port 3
Port 3
Activity n
Port n
Script-based experiment.
Abstract workflow.
![Page 14: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/14.jpg)
14
Requirements
● Create executable workflow and compare execution of workflow and script.
2
Executable workflow. Script-based experiment.
![Page 15: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/15.jpg)
15
Requirements
● Modify the workflow resources.3
Local
(a)
(b)
Algorithm A Algorithm B
![Page 16: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/16.jpg)
16
Requirements
● Record provenance data4
Activity 1
Output 1 Output 2
wasGeneratedBy wasGeneratedBy
Sampleused
“2012-06-01”
wasStartedAt
Activity 2used
LucasWorkflowRun
wasAssociatedWith
used
![Page 17: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/17.jpg)
17
Requirements
● Aggregate all resources to support Reproducibility and Reuse.
5
Abstract workflows
Concrete workflows
Annotations
Papers and Reports
Provenance
Authors
Scripts
Data
![Page 18: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/18.jpg)
18
Script
Generate Abstract Workflow
Generate Abstract Workflow
Create an executable workflow
Create an executable workflow
Refine workflowRefine workflow
Bundle Resources into a Research Object
Bundle Resources into a Research Object
Annotate and check qualityAnnotate and check quality
Abstract workflow
Concrete workflow
2
1
3
4
5
Methodology
![Page 19: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/19.jpg)
19
Workflow Research Object (WRO)
● Research Objects are semantically rich aggregations of resources that bring together data, methods and people in scientific investigations.
● WROs encapsulate scientific workflows and additional information regarding their context and resources. Research Object Model
![Page 20: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/20.jpg)
20
Running Example
● Molecular Dynamics Simulations– Many branches of material sciences, computational
engineering, physics and chemistry.
– Scripts (shell script), programs (NAMD, VMD, Fortran)
– Phases: set up, simulation and analysis of trajectories.
– Inputs: protein structure, simulation parameters and force field files.
– Output: trajectories and analysis results.
![Page 21: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/21.jpg)
21
StepGenerate Abstract Workflow
1
Script code.
![Page 22: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/22.jpg)
22
StepGenerate Abstract Workflow
1
Manuallyannotate
Script code.Annotated script code.
![Page 23: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/23.jpg)
23
StepGenerate Abstract Workflow
1
Manuallyannotate
Createworkflow-like
view
Script code.Annotated script code.
Abstract workflow.
![Page 24: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/24.jpg)
24
StepGenerate Abstract Workflow
1
code blocks
Input/ouput
YesWorkflowMcPhillips et. al, 2015
- Code comments- Tags: ● @begin● @end● @desc● @in● @out● ...
T. McPhillips et al. (2015), “Yesworkflow: A user-oriented, language-independent tool for recovering workflow information from scripts,” International Journal of Digital Curation, vol. 10, no. 1, pp. 298–313, 2015.
CreateWorkflow-like
view
Abstract workflow.
Annotated script code.
![Page 25: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/25.jpg)
25
StepGenerate Abstract Workflow
1
CreateWorkflow-like
view
Abstract workflow.
Annotated script code.
![Page 26: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/26.jpg)
26
StepCreate an executable workflow
2
Abstract workflow.
![Page 27: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/27.jpg)
27
StepCreate an executable workflow
2
Create implementationof activities
Copy code blocks from the script.
Abstract workflow.
Executable workflow.
![Page 28: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/28.jpg)
28
StepCreate an executable workflow
2
Create implementationof activities
Copy code blocks from the script.
Abstract workflow.
Executable workflow.
![Page 29: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/29.jpg)
29
StepCreate an executable workflow
2
Create implementationof activities
Copy code blocks from the script.
Abstract workflow.
Executable workflow.Script code.
![Page 30: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/30.jpg)
30
StepRefine executable workflow
3
Modify resources:● Algorithms● Data Sets● Parallelization● Web Services● ...
Executable workflow.New workflow version.
![Page 31: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/31.jpg)
31
StepRefine executable workflow
3
Create newversion
Modify resources:● Algorithms● Data Sets● Parallelization● Web Services● ...
Executable workflow.New workflow version.
![Page 32: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/32.jpg)
32
Steps
Record provenance data: execution traces.
2 3
wasEnactedBy
split
Output 1 Output 2wasGeneratedBy wasGeneratedBy
Sampleused
“2012-06-01”
wasStartedAt
psgen
used
LucasWorkflowRun
wasAssociatedWith
usedhasSpecification
W3C PROV
Executable workflow.
![Page 33: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/33.jpg)
33
Steps
Record provenance data: conversion process.
2 3
wasDerivedFrom
wasDerivedFrom
wasDerivedFrom
wasAssociatedWith
CuratorCurator
W3C PROV
Executable workflow.New workflow version.
Script code.
![Page 34: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/34.jpg)
34
StepAnnotate and check quality
● Annotations describing the workflow.● Use provenance data
– To check the quality of the conversion process.
● Run checks to verify the soundness of the workflow.
4
![Page 35: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/35.jpg)
35
StepAnnotate and check quality
4
Script code.
Executable workflow.
![Page 36: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/36.jpg)
36
StepAnnotate and check quality
4
Workflow version.
Initial Executable workflow.
![Page 37: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/37.jpg)
37
StepAnnotate and check quality
● Common mistakes during the conversion:– not clearly identified the main logical processing
units in the script;
– a mistake when migrating script code into the corresponding activity;
– not provided the correct input files and parameters;
– the coding of the workflow itself contained errors.
4
![Page 38: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/38.jpg)
38
StepBundle Resources into a Research Object
5
Script Abstract workflow
Concrete workflow(s)
Annotations
Paper
ProvenanceData
Attributions
![Page 39: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/39.jpg)
39
Contributions
● A methodology that guides curators in a principled manner to transform scripts into reproducible and reusable WRO;
● This addresses an important issue in the area of script provenance;
![Page 40: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/40.jpg)
40
Conclusions
● We addressed issues wrt understanding, reuse and reproducibility of script-based experiments.
● The methodology created was:– elaborated based on requirements;
– showcased via a real world use case from the field of Molecular Dynamics;
● We exploited tools and standards from the scientific community:– Scientific Workflows, YesWorkflow, Research Objects, the W3C
PROV recommendations and the Web Annotation Data Model.
● The bundle is available at http://w3id.org/w2share/s2rwro/
![Page 41: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/41.jpg)
41
Next Steps
● Evaluation using other case studies;● Evaluation of the cost of the effectiveness of
our methodology;● Extension of YesWorkflow to support the
semantic annotation of blocks;● Implementation of tools.
![Page 42: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/42.jpg)
42
Acknowledgments
● FAPESP (grant # 2014/23861-4)● CCES/CEPID (grant # 2013/08293-7)
– Center for Computational Engineering & Sciences
● LIS (Laboratory of Information Systems)● Prof. Munir Skaf and his group from Institute of
Chemistry - Unicamp.
![Page 43: Converting scripts into reproducible workflow research objects](https://reader031.vdocuments.mx/reader031/viewer/2022030311/58ef2ee91a28abd8628b45c3/html5/thumbnails/43.jpg)
Converting Scripts into Reproducible Workflow Research Objects
Lucas A. M. C. Carvalho, Khalid Belhajjame, Claudia Bauzer Medeiros
Baltimore, Maryland, USA
October 23-26, 2016