cis 602-01: computational reproducibilitydkoop/cis602-2016fa/lectures/lecture21.pdf · workflow...
TRANSCRIPT
![Page 1: CIS 602-01: Computational Reproducibilitydkoop/cis602-2016fa/lectures/lecture21.pdf · Workflow Evolution Provenance of MTA Fare Data D. Koop, CIS 602-01, Fall 2016 2 HTTPFile CSVFile](https://reader030.vdocuments.mx/reader030/viewer/2022040912/5e8775b8a994db69bd1059ce/html5/thumbnails/1.jpg)
D. Koop, CIS 602-01, Fall 2016
CIS 602-01: Computational Reproducibility
Tools: Packaging
Dr. David Koop
![Page 2: CIS 602-01: Computational Reproducibilitydkoop/cis602-2016fa/lectures/lecture21.pdf · Workflow Evolution Provenance of MTA Fare Data D. Koop, CIS 602-01, Fall 2016 2 HTTPFile CSVFile](https://reader030.vdocuments.mx/reader030/viewer/2022040912/5e8775b8a994db69bd1059ce/html5/thumbnails/2.jpg)
Workflow Evolution Provenance of MTA Fare Data
2D. Koop, CIS 602-01, Fall 2016
HTTPFile HTTPFile
CSVFile JSONFile
JoinTables
ProjectTable
GMapCell
HTTPFile
CSVFile
TableCell
Map
MplBar MplAxesPropertiesMplFigureProperties
MplFigure
MplFigureCell
GetFareData(Group)
DateRange(PythonSource)
BuildLabels(PythonSource)
![Page 3: CIS 602-01: Computational Reproducibilitydkoop/cis602-2016fa/lectures/lecture21.pdf · Workflow Evolution Provenance of MTA Fare Data D. Koop, CIS 602-01, Fall 2016 2 HTTPFile CSVFile](https://reader030.vdocuments.mx/reader030/viewer/2022040912/5e8775b8a994db69bd1059ce/html5/thumbnails/3.jpg)
Workflow Evolution Provenance of MTA Fare Data
2D. Koop, CIS 602-01, Fall 2016
![Page 4: CIS 602-01: Computational Reproducibilitydkoop/cis602-2016fa/lectures/lecture21.pdf · Workflow Evolution Provenance of MTA Fare Data D. Koop, CIS 602-01, Fall 2016 2 HTTPFile CSVFile](https://reader030.vdocuments.mx/reader030/viewer/2022040912/5e8775b8a994db69bd1059ce/html5/thumbnails/4.jpg)
Workflow Evolution Provenance of MTA Fare Data
2D. Koop, CIS 602-01, Fall 2016
![Page 5: CIS 602-01: Computational Reproducibilitydkoop/cis602-2016fa/lectures/lecture21.pdf · Workflow Evolution Provenance of MTA Fare Data D. Koop, CIS 602-01, Fall 2016 2 HTTPFile CSVFile](https://reader030.vdocuments.mx/reader030/viewer/2022040912/5e8775b8a994db69bd1059ce/html5/thumbnails/5.jpg)
Workflow Evolution Provenance
3D. Koop, CIS 602-01, Fall 2016
GMapCircleCell
![Page 6: CIS 602-01: Computational Reproducibilitydkoop/cis602-2016fa/lectures/lecture21.pdf · Workflow Evolution Provenance of MTA Fare Data D. Koop, CIS 602-01, Fall 2016 2 HTTPFile CSVFile](https://reader030.vdocuments.mx/reader030/viewer/2022040912/5e8775b8a994db69bd1059ce/html5/thumbnails/6.jpg)
Workflow Evolution Provenance
3D. Koop, CIS 602-01, Fall 2016
GMapCircleCell
delete module “GMapCell”delete module “CellLocation”delete module “ProjectTable”
delete module “SelectFromTable”...
add module “SelectFromTable”add parameter “float_expr” to “SelectFromTable”
with value “latitutde > 40.6”delete parameter “float_expr” from “SelectFromTable”
add parameter “float_expr” to “SelectFromTable” with value “latitutde > 40.7”
delete parameter “float_expr” from “SelectFromTable”add parameter “float_expr” to “SelectFromTable”
with value “latitutde > 40.8”...
![Page 7: CIS 602-01: Computational Reproducibilitydkoop/cis602-2016fa/lectures/lecture21.pdf · Workflow Evolution Provenance of MTA Fare Data D. Koop, CIS 602-01, Fall 2016 2 HTTPFile CSVFile](https://reader030.vdocuments.mx/reader030/viewer/2022040912/5e8775b8a994db69bd1059ce/html5/thumbnails/7.jpg)
Execution Provenance
4D. Koop, CIS 602-01, Fall 2016
![Page 8: CIS 602-01: Computational Reproducibilitydkoop/cis602-2016fa/lectures/lecture21.pdf · Workflow Evolution Provenance of MTA Fare Data D. Koop, CIS 602-01, Fall 2016 2 HTTPFile CSVFile](https://reader030.vdocuments.mx/reader030/viewer/2022040912/5e8775b8a994db69bd1059ce/html5/thumbnails/8.jpg)
vtkActor
VTKCell
vtkRenderer
vtkContourFilter
vtkStructuredPointsReader
vtkDataSetMapper
vtkCamera
Execution Provenance<module id="12" name="vtkDataSetReader" start_time="2010-02-19 11:01:05" end_time="2010-02-19 11:01:07"> <annotation key="hash" value="c54bea63cb7d912a43ce"/> </module> <module id="13" name="vtkContourFilter" start_time="2010-02-19 11:01:07" end_time="2010-02-19 11:01:08"/> <module id="15" name="vtkDataSetMapper" start_time="2010-02-19 11:01:09" end_time="2010-02-19 11:01:12"/> <module id="16" name="vtkActor" start_time="2010-02-19 11:01:12" end_time="2010-02-19 11:01:13"/> <module id="17" name="vtkCamera" start_time="2010-02-19 11:01:13" end_time="2010-02-19 11:01:14"/> <module id="18" name="vtkRenderer" start_time="2010-02-19 11:01:14" end_time="2010-02-19 11:01:14"/> ...
5D. Koop, CIS 602-01, Fall 2016
![Page 9: CIS 602-01: Computational Reproducibilitydkoop/cis602-2016fa/lectures/lecture21.pdf · Workflow Evolution Provenance of MTA Fare Data D. Koop, CIS 602-01, Fall 2016 2 HTTPFile CSVFile](https://reader030.vdocuments.mx/reader030/viewer/2022040912/5e8775b8a994db69bd1059ce/html5/thumbnails/9.jpg)
VisTrails Demo• Download data from web • Process it via tools from the
tabledata package • Tag specific versions • Run them from the command-line
6D. Koop, CIS 602-01, Fall 2016
![Page 10: CIS 602-01: Computational Reproducibilitydkoop/cis602-2016fa/lectures/lecture21.pdf · Workflow Evolution Provenance of MTA Fare Data D. Koop, CIS 602-01, Fall 2016 2 HTTPFile CSVFile](https://reader030.vdocuments.mx/reader030/viewer/2022040912/5e8775b8a994db69bd1059ce/html5/thumbnails/10.jpg)
Assignment 2• http://www.cis.umassd.edu/~dkoop/cis602/assignment2.html • Updated this past weekend with better infrastructure to run the
workflows in Docker • Keep your project on Github, keep images on Docker Hub • Put a link to your Docker Hub images in your Github README.md • Note CMD versus ENTRYPOINT difference in Docker
7D. Koop, CIS 602-01, Fall 2016
![Page 11: CIS 602-01: Computational Reproducibilitydkoop/cis602-2016fa/lectures/lecture21.pdf · Workflow Evolution Provenance of MTA Fare Data D. Koop, CIS 602-01, Fall 2016 2 HTTPFile CSVFile](https://reader030.vdocuments.mx/reader030/viewer/2022040912/5e8775b8a994db69bd1059ce/html5/thumbnails/11.jpg)
Projects• Survey:
- Have not heard from many of you about the paper you will be reproducing…
- Due tomorrow, will reply or assign • Research:
- How can you test if your work is reproducible? - Why are the specific approaches (code versioning, containers)
important to your research? - What does your work add to existing reproducibility tools?
8D. Koop, CIS 602-01, Fall 2016
![Page 12: CIS 602-01: Computational Reproducibilitydkoop/cis602-2016fa/lectures/lecture21.pdf · Workflow Evolution Provenance of MTA Fare Data D. Koop, CIS 602-01, Fall 2016 2 HTTPFile CSVFile](https://reader030.vdocuments.mx/reader030/viewer/2022040912/5e8775b8a994db69bd1059ce/html5/thumbnails/12.jpg)
Covered Topics• Version Control (Code) • Data Sharing, Citation, and Availability • Virtual Machines • Containers • Scientific Workflows • Provenance
9D. Koop, CIS 602-01, Fall 2016
![Page 13: CIS 602-01: Computational Reproducibilitydkoop/cis602-2016fa/lectures/lecture21.pdf · Workflow Evolution Provenance of MTA Fare Data D. Koop, CIS 602-01, Fall 2016 2 HTTPFile CSVFile](https://reader030.vdocuments.mx/reader030/viewer/2022040912/5e8775b8a994db69bd1059ce/html5/thumbnails/13.jpg)
Tools• We have seen specific tools, too:
- Git, Github - DOIs, Dryad, DataONE, figshare - Xen, Parallels, EC2 - Docker - Pegasus, Kepler, VisTrails, Taverna - PDIFF, Analogies, VisComplete
• More at ReproMatch
10D. Koop, CIS 602-01, Fall 2016
![Page 14: CIS 602-01: Computational Reproducibilitydkoop/cis602-2016fa/lectures/lecture21.pdf · Workflow Evolution Provenance of MTA Fare Data D. Koop, CIS 602-01, Fall 2016 2 HTTPFile CSVFile](https://reader030.vdocuments.mx/reader030/viewer/2022040912/5e8775b8a994db69bd1059ce/html5/thumbnails/14.jpg)
These are all ingredients, how do they work together?
11D. Koop, CIS 602-01, Fall 2016
![Page 15: CIS 602-01: Computational Reproducibilitydkoop/cis602-2016fa/lectures/lecture21.pdf · Workflow Evolution Provenance of MTA Fare Data D. Koop, CIS 602-01, Fall 2016 2 HTTPFile CSVFile](https://reader030.vdocuments.mx/reader030/viewer/2022040912/5e8775b8a994db69bd1059ce/html5/thumbnails/15.jpg)
Packaging• If I want to replicate/reproduce something,
- How do I ensure that I have everything I need? - How do I package it up so that someone else can access it?
• Two Tools - CDE - ReproZip
12D. Koop, CIS 602-01, Fall 2016
![Page 16: CIS 602-01: Computational Reproducibilitydkoop/cis602-2016fa/lectures/lecture21.pdf · Workflow Evolution Provenance of MTA Fare Data D. Koop, CIS 602-01, Fall 2016 2 HTTPFile CSVFile](https://reader030.vdocuments.mx/reader030/viewer/2022040912/5e8775b8a994db69bd1059ce/html5/thumbnails/16.jpg)
D. Koop, CIS 602-01, Fall 2016
CDE: Using System Call Interposition to Automatically Create Portable Software Packages
P. J. Guo and D. Engler
![Page 17: CIS 602-01: Computational Reproducibilitydkoop/cis602-2016fa/lectures/lecture21.pdf · Workflow Evolution Provenance of MTA Fare Data D. Koop, CIS 602-01, Fall 2016 2 HTTPFile CSVFile](https://reader030.vdocuments.mx/reader030/viewer/2022040912/5e8775b8a994db69bd1059ce/html5/thumbnails/17.jpg)
CDE• Mimic the specific system pieces required in a user directory • Capture via trace • Intercept system calls (both when capturing and when running) • Linux-specific
14D. Koop, CIS 602-01, Fall 2016
![Page 18: CIS 602-01: Computational Reproducibilitydkoop/cis602-2016fa/lectures/lecture21.pdf · Workflow Evolution Provenance of MTA Fare Data D. Koop, CIS 602-01, Fall 2016 2 HTTPFile CSVFile](https://reader030.vdocuments.mx/reader030/viewer/2022040912/5e8775b8a994db69bd1059ce/html5/thumbnails/18.jpg)
D. Koop, CIS 602-01, Fall 2016
ReproZip: Computational Reproducibility With Ease
F. Chirigati, R. Rampin, D. Shasha, and J. Freire
![Page 20: CIS 602-01: Computational Reproducibilitydkoop/cis602-2016fa/lectures/lecture21.pdf · Workflow Evolution Provenance of MTA Fare Data D. Koop, CIS 602-01, Fall 2016 2 HTTPFile CSVFile](https://reader030.vdocuments.mx/reader030/viewer/2022040912/5e8775b8a994db69bd1059ce/html5/thumbnails/20.jpg)
ReproZip extensions to existing packaging tools• Focus on without-forethought reproducibility • Other tools: PTU, CARE, CDE • Portability: runs using VMs or containers • Extensibility: may port to other environments using other unpackers • Modifiability: identifies inputs, outputs, parameters • Usability: command-line tool and control over the trace
17D. Koop, CIS 602-01, Fall 2016
![Page 21: CIS 602-01: Computational Reproducibilitydkoop/cis602-2016fa/lectures/lecture21.pdf · Workflow Evolution Provenance of MTA Fare Data D. Koop, CIS 602-01, Fall 2016 2 HTTPFile CSVFile](https://reader030.vdocuments.mx/reader030/viewer/2022040912/5e8775b8a994db69bd1059ce/html5/thumbnails/21.jpg)
What should tools look like?• "What we need right now is scientists actually using stuff that
already exists, not engineers building new stuff that no one will ever use" — C. Titus Brown
18D. Koop, CIS 602-01, Fall 2016