Download - G3 talk rld_2
Tools for:
Open-SourceOpen-Data
Rob L Davidson about.me/rob.davidson
The problem
reproducibility.cs.arizona.edu• 515 papers (429 conf, 86 journal) • <30% reproducible
The problem
reproducibility.cs.arizona.edu
The Cause
• Stodden 2010– 638 registrant at NIPS
• 30% share code
• 20% share data
http://web.stanford.edu/~vcs/papers/SMPRCS2010.pdf
Publishers must provide!
Hosting
Curating
Citations for everything:
data, tools + workflows
Tools for Reproducibility
• Data: GigaDB
• Images: OMERO• Workflows
– Galaxy – Executable Docs– VMs
GigaDB
github.com/gigascience/gigadb-cogini
Hosting all data
Hosting all research objects
Impact for research objects
• Host• Curate• Share
• Cite - DOI
Even more accessible, transparent data?
Hosting image data with OMERO
Hosting Images
• Image LIMS • Web embedding
– View online, noneed for software
• Full res• Link all images to
publication– No cherry picking
http://www.openmicroscopy.org/site/products/omero
NO
Cyber-Centipedes! Phenotyping
Accessible Cyber-Centipede images
OMERO: providingaccess to imaging data
View, filter, measure rawimages with direct linksfrom journal article.
See all image data, notjust cherry pickedexamples.
Download and reprocess.
OMERO: Adding value
The alternative...
...look but don't touch
Workflows
1. Galaxy
galaxyproject.org
galaxy.cbiit.cuhk.edu.hk
Implement workflows in a community-acceptedformat
http://galaxyproject.org
Over 45,000 main Galaxy server users
Over 1,000 papersciting Galaxy use
Over 55 Galaxyservers deployed
Open source
Copyright NBAF-B 2013Tool list Tool parameterisation Results panel
Implement workflows in an intuitive format
Visualising Workflows
Birmingham Metabo-Galaxy Workflow
Birmingham Metabo-Galaxy
Tools wrapped in Python and XMLUser sees web form (easy!)Data stored centrally (secure!)Work done centrally (easy update)
Hosting Workflows
Hosting Workflows
1) Test data2) Software files3) Instructions+ Galaxy implementation
Can we reproduce results? SOAPdenovo2 S. aureus pipeline
Workflows
2. Executable Docs
Open lab books, dynamic documents• Facilitate reuse and sharing with tools like: Knitr, Sweave,
iPython Notebook
Sweave
• Working towards executable papers…
E.g.
E.g.
Some testimonials for KnitrAuthors (Wolfgang Huber)
“I do all my projects in Knitr. Having the textualexplanation, the associated code and the results all in oneplace really increases productivity, and helps explainingmy analyses to colleagues, or even just to my future self.”
Reviewers (Christophe Pouzat)
“It took me a couple of hours to get the data, the fewcustom developed routines, the “vignette” and toREPRODUCE EXACTLY the analysis presented in themanuscript. With few more hours, I was able to modify theauthors’ code to change their Fig. 4. In addition to makingthe presented research trustworthy, the reproducibleresearch paradigm definitely makes the reviewer’s jobmuch more fun!
Workflow accessibility:
VMs
Why VMs?
• OS settings• Dependencies
– Versions– e.g. python!
• Data + Code linked• Download or run in
cloud
VMs in GigaDB
Summary
Share data in GigaDB
Share all images in GigaDB
-View images via OMERO
Share code in GigaDB!
Share pipeline using:
Executable docs!
Galaxy!
VMs!
Give us data, papers& pipelines*
Improvereproducibility!
[email protected] [email protected] [email protected]
Contact us:
* APC’s currently generously coveredby BGI until 2015
www.gigasciencejournal.com
Ruibang Luo (BGI/HKU)Shaoguang Liang (BGI-SZ)Tin-Lap Lee (CUHK)Qiong Luo (HKUST)Senghong Wang (HKUST)Yan Zhou (HKUST)
Thanks to:
@gigasciencefacebook.com/GigaScienceblogs.biomedcentral.com/gigablog/
Peter LiHuayan Gao Chris HunterJesse Si ZheNicole NogoyLaurie GoodmanAmye Kenall(BMC)
Marco Roos (LUMC)Mark Thompson (LUMC)Jun Zhao (Lancaster)Susanna Sansone (Oxford)Philippe Rocca-Serra (Oxford) Alejandra Gonzalez-Beltran(Oxford)
www.gigadb.orggalaxy.cbiit.cuhk.edu.hk
www.gigasciencejournal.com
CBIITFunding from:
Our collaborators:team: Case study: