the open science framework (osf) at notre dame: connecting the workflow and supporting the research...
TRANSCRIPT
The Open Science Framework (OSF) at Notre Dame: Connecting the Workflow and
Supporting the Research Mission
CNI Fall 2015 Membership Meeting Washington, DC
Andrew Sallans Natalie MeyersPartnerships Lead E-Research
Librarian
Center for Open Science University of Notre Dame
The OSF at Notre Dame, CNI Fall 2015
12/15/2015 CNI Fall Mtg https://osf.io/s5e2b/
OSF Extensions & Pilots @ ND
https://osf.io/s5e2b/
I want to preserve my simulation methodand results so other people can try it out.
data output
DOI:10.XXXX DOI:10.ZZZZ
DOI:10.CCCC
DOI:10.YYYY
… and repeat this 1M times with different –p values.
mysim.exe –in data –out output –p 10
12/15/2015 CNI Fall Mtg
https://osf.io/s5e2b/
But it’s not that simple!
12/15/2015 CNI Fall Mtg
I want to preserve my simulation methodand results so other people can try it out.
data output
mysim.exe –in data –out output –p 10
config
calib HTTP GET
Green Goat Linux 57.83.09.B
libsimruby
X86-64 CPU / 64GB RAM / 200GB Disk
SIM_MODE=clever
Your application works perfectly todayon your machineWill your application still work next month?Will your application still work next year?Will your application still work 10 years later?Will your application still work today on another machine?
Challenges of Reproducible Computing
The DASPOS Project Team includes computer science experts from the University of Notre Dame and the University of Chicago, physicists from the ATLAS and CMS experiments at the LHC, the DØ experiment at the Tevatron, experts in other data-intensive fields such as bioinformatics and astrophysics, and digital librarians with broad experience in the preservation of large datasets in the sciences and humanities.
The DASPOS project has been funded in whole or in part with Federal funds from the National Science Foundation, under Award No. 1247316.
daspos.org
https://osf.io/s5e2b/
Goal and Scope of ProjectThe goal of DASPOS is to “scout out” solutions to the most pressing technical problems, and make them available to those constructing preservation systems. In particular, this project will:• Establish a dialogue with other fields facing preservation and re-use issues with Big Data.
Identify areas of commonality and outline where solutions diverge due to specific needs.• Develop metadata to support the preservation and re-use of HEP data, and its related
software and computational algorithms. Design the metadata so as to meet the needs of as many other fields as possible for wide re-use.
• Define a reference architecture for a data preservation system targeted for HEP but coordinated with other fields. Include decision points where policy choices impact the architectural structure.
• Develop a preservation validation test-bed on which a technical implementation of the reference architecture can be developed and constructed.
• Perform a Curation Challenge, where a physics data analysis is conducted based solely on curated and archived data.
daspos.org
12/15/2015 CNI Fall Mtg
VecNet’s Malaria Modelers - Share Simulations & Results
VecNet Digital Library
Our digital library software stack and features were first developed and presented for beta feedback in 2013:
Brower D, Lakshminarayanan B, Meyers N. Multiple Identities: Managing Authorities in Repositories and Digital Collections presented at American Library Association Annual Conference, Chicago, IL 2013.
and then again at last year’s ACM/IEEE JCDL conference :
Barker M, Brower D, and Meyers N. Vector-Borne Disease Network Digital Library presented at Digital Libraries 2014 IEEE(978-1-4799-5569-5) London, UK,
Sept 9, 2014 .
dl.vecnet.org Vector-borne Disease Network
VecNet digital library supports mathematical modeling of malaria transmission & eradication.
It is a repository for curating & sharing information about simulations used to model malaria transmission & the impact of interventions
Contains: field, lab, survey, climate, demographic, and simulation data, input file code snippets, input file sets for models, simulations, tagged bibliographic citations, articles, maps, reports and more on entomology, epidemiology, demography, climatology, and interventions
VecNet Digital Library
Boehm, R. and Meyers N. Repository Platforms for Research Data: VecNet Use Case presented at Research Data Alliance (RDA) 6th Plenary Meeting, Paris, Sept 25, 2015.
Dynamic Data Citation & Repositories for Research Data
Meyers N. Dynamic Data Citation: VecNet Use Case presented at Federation of Earth Science Information Partners’ Winter Meeting Dynamic Data Citation Workshop, Washington, D.C., Jan 8, 2015.
Meyers N. VecNet Digital Library & Data Citation for Simulations presented at Institute for Disease Modeling 3rd Annual Modeling Symposium, Bellevue, Washington, April 22, 2015
Attended Andrew Sallans’ talk “Improving Integrity, Transparency, and Reproducibility Through Connection of the Scholarly Workflow” during NISO’s virtual Conference: Scientific Data Management: Caring for Your Institution and its Intellectual Wealth. February 18, 2015Attended Open Repositories ‘15 and was attracted to OSF featuresHosted an A Panel Presentation of the CoS Reproducibility Projects at Notre Dame’s Center for Digital Scholarship Sept 9, 2015.
Getting to Know OSF
Integrating our Institutional Repository w/OSF (CAS Authentication) Embarked on NDS Dashboard integration w/CRC & Ian TaylorPiloting registration of select VecNet malaria data files in OSFTesting Umbrella Software Preservation tool interactivity with OSF (OpenMalaria simulation execution Use Case) Working on a reproducible software engineering environment by creating and documenting a reproducible development environment for the OSF framework
Openstack images to run OSF frontend backend service on CRC resources and Vagrant/Virtualbox files for use by developers on their laptops (ongoing)
OSF Related Ongoing Efforts at ND
Why OSF and an Institutional Repository?
1. Why integrate OSF w/CurateND? -> Start Staging Data for Preservation & initial sharing btwn collaborators
2. Institutional Branding and Central Authentication -> Fosters Ease of Use & Trust Among Institutional Researchers
3. Group Role Enhancements –> Hierarchical Lab Roles
4. Storage Source Configuration -> Flexibility of Resources
5. Integration with Computational Environment -> Access to HPC & Reuse
6. Metadata Enhancements to OSF -> Incrementally & automatically add Metadata prior to a preservation phase effort
7. Push OSF Project Snapshot (aka Registration) to CurateND –> EZ deposit to Institutional Repository preservation storage encourages institutional data preservation
CurateND Institutional Repository OSF Integration
Contact: Rick Johnson [email protected]
NDS OSF Dashboard integration
http://www.nationaldataservice.org/http://ndspilot.com
Contact:Ian Taylor
bitbucket.org/nds-org/nds-dashboard
Umbrella: Ensuring executable software preservation & reuse
http://ccl.cse.nd.edu/software/umbrella
A Portable Environment Creator for Reproducible Computing on Clusters, Clouds, and Grids
Specify the execution environment clearly -- Hardware, Kernel, OS, Software, Data, Environment VariablesMaterialize the execution environment at runtime automatically -- No need to configure environment manually -- Matching evaluation & choose minimal mechanismLoose-coupled with sandbox techniques: -- Parrot, chroot, VM, DockerConstruct sandbox through mounting mechanisms without copying -- multiple namespaces can be constructed concurrentlyUtilize more computing resources: -- Local Machine, Grid, Cloud
Umbrella Features
Makes Applications Portable and Reproducible
Umbrella & OSF The Open Malaria Use Case
A Tool for Ensuring executable software preservation & reuse
Umbrella & Open Malaria Use Case Contacts
Haiyan Meng [email protected]
Douglas [email protected]
Please use the following citation for Umbrella in a scientific publication: Haiyan Meng and Douglas Thain, Umbrella: A Portable Environment Creator for Reproducible Computing on Clusters, Clouds, and Grids, Workshop on Virtualization Technologies in Distributed Computing (VTDC) at HPDC, June, 2015. DOI: 10.1145/2755979.2755982
For more information about Umbrella:The Cooperative Computing Lab http://ccl.cse.nd.edu
Alex [email protected]
About Open Malaria Use Case:The Center for Research Computinghttp://crc.nd.edu
Learning from DASPOS, Umbrella & NDS / OSF
Repositories: Will they take provisional data, active data . . . ?Compatibility: Can we plug into existing tools? Diff? Jupyter? Software Preservation Layers: Preserve program binaries, or sources + compilers, or something else? (Parrot, Umbrella, Prune . . . ) Naming: Tension between usability and durability: URL’s, DOIs, PIDs, UUIDs, HMACs, . . .
Complexity of Composition: Connect systems together? NDS? OSF? CurateND? Citation: Dynamic? Static? For publication? For reuse ? Usability: Do users have to change behavior?Overhead: Tools must be close to native performance, or they won’t get used.
NDS dashboard Enhancements including backend container toolkit development• Fix bugs that cause exceptions for valid operations. • Optimize the toolkit to reduce time taken to perform tasks • Implement post operation to support the uploading of files into OSF
storage providers
Automate the uploading of the diff of the containers for each run into OSF storage
Support VNC working on a container, users can pull up a remote desktop to a container & viewing remote desktop apps e.g. Pegasus workflow.
• Backend integration of Jupyter notebooks• Front end spawning of these which manages the state i.e. spawn notebook, all
editing and then copy edited content back into the OSF storage to update content
OSF ND Immediate Next Efforts
OSF can be useful in other projects:Spatial Repellents Trial Piloting use for master files management
EU-funded Switch Project is considering use of the OSF• http://www.switchproject.eu/
Collaboration with USC's Institute for Information Sciences on RACE: Repository and Workflows for Accelerating Circuit Realization. • RACE is developing a trusted repository for integrated circuit designs. • OSF / NDS Dashboard can be extended and integrated with the Pegasus
Workflow system and interface to CurateND for long term circuit designs' preservation.
Potential Future OSF Projects
12/15/2015 CNI Fall Mtg https://osf.io/s5e2b/
Contact:
Andrew Sallans Natalie MeyersPartnerships Lead E-Research
Librarian
Center for Open Science University of Notre Dame
The OSF at Notre Dame, CNI Fall 2015
osf.io
cos.io
crc.nd.edu
library.nd.edu daspos.org
vecnet.org
ccl.cse.nd.edu
nationaldataservice.org
More Info:
Link to this Presentation:https://osf.io/s5e2b/