rob davidson at the g3 workshop: open source - tools for reproducibility

Post on 27-Nov-2014

311 Views

Category:

Science

3 Downloads

Preview:

Click to see full reader

DESCRIPTION

Rob Davidson at the G3 (Great GigaScience & Galaxy) Workshop: Open Source - Tools for Reproducibility. University of Melbourne, 19th September 2014

TRANSCRIPT

Tools for:

Open-SourceOpen-Data

Rob L Davidson about.me/rob.davidsonwww.slideshare.net/RobertDavidson6/g3-talk-rld2

The problem

reproducibility.cs.arizona.edu• 515 papers (429 conf, 86 journal) • <30% reproducible

The problem

reproducibility.cs.arizona.edu

The Cause

• Stodden 2010– 638 registrant at NIPS

• 30% share code• 20% share data

http://web.stanford.edu/~vcs/papers/SMPRCS2010.pdf

Publishers must provide! HostingCurating

Citations for everything:data, tools + workflows

Tools for Reproducibility

• Data: GigaDB• Images: OMERO• Workflows

– Galaxy – Executable Docs– VMs

GigaDBgithub.com/gigascience/gigadb-cogini

Hosting all data

Hosting all research objects

Impact for research objects

• Host• Curate• Share• Cite - DOI

Even more accessible, transparent data?Hosting image data with OMERO

Re-producing Images Image LIMS Keeps metadata with image Means the image can be

found later! Image can be understood Also some processing

options

http://www.openmicroscopy.org/site/products/omero

Accessible, transparent Images Embed in web Full res View without special

software Adjust contrast etc Link all images to pub!

No cherry picking!

http://www.openmicroscopy.org/site/products/omero

NO

Cyber-Centipedes! Phenotyping

Accessible Cyber-Centipede images

OMERO: providing access to imaging data

View, filter, measure raw images with direct links from journal article.

See all image data, not just cherry picked examples.

Download and reprocess.

OMERO: Adding value

The alternative...

...look but don't touch

Workflows 1. Galaxy

galaxyproject.org

galaxy.cbiit.cuhk.edu.hk

Implement workflows in a community-accepted format

http://galaxyproject.org

Over 45,000 main Galaxy server users

Over 1,000 papersciting Galaxy use

Over 55 Galaxyservers deployed

Open source

Copyright NBAF-B 2013Tool list Tool parameterisation Results panel

Implement workflows in an intuitive format

Visualising Workflows

Birmingham Metabo-Galaxy Workflow

Birmingham Metabo-Galaxy

Tools wrapped in Python and XMLUser sees web form (easy!)Data stored centrally (secure!)Work done centrally (easy update)

Hosting Workflows

Hosting Workflows

1) Test data2) Software files3) Instructions+ Galaxy implementation

Can we reproduce results? SOAPdenovo2 S. aureus pipeline

GalaxyMost accessible

Easy to share (galaxy toolshed)Quite a bit of work

Doesn't include publication explanations

Workflows2. Executable Docs

Open lab books, dynamic documents• Facilitate reuse and sharing with tools like: Knitr, Sweave,

iPython Notebook

Sweave

• Working towards executable papers…

E.g.

E.g.

Some testimonials for KnitrAuthors (Wolfgang Huber)“I do all my projects in Knitr. Having the textual explanation, the associated code and the results all in one place really increases productivity, and helps explaining my analyses to colleagues, or even just to my future self.”

Reviewers (Christophe Pouzat) “It took me a couple of hours to get the data, the few custom developed routines, the “vignette” and to REPRODUCE EXACTLY the analysis presented in the manuscript. With few more hours, I was able to modify the authors’ code to change their Fig. 4. In addition to making the presented research trustworthy, the reproducible research paradigm definitely makes the reviewer’s job much more fun!

Executable docs:Completely reproduce paper!May require some code-skills

Workflow accessibility:VMs

Why VMs?

• OS settings• Dependencies

– Versions– e.g. python!

• Data + Code linked• Download or run in

cloud

VMs in GigaDB

VMs:Can host Galaxy

Can hold KnitR codeProvides 'snapshot' of working system

Summary

Share data in GigaDBShare all images in GigaDB-View images via OMERO

Share code in GigaDB!Share pipeline using:

Executable docs!Galaxy!

VMs!

Give us data, papers & pipelines*

Improve reproducibility!

scott@gigasciencejournal.com editorial@gigasciencejournal.com database@gigasciencejournal.com

Contact us:

* APC’s currently generously covered by BGI until 2015

www.gigasciencejournal.com

Ruibang Luo (BGI/HKU)Shaoguang Liang (BGI-SZ)Tin-Lap Lee (CUHK)Qiong Luo (HKUST)Senghong Wang (HKUST)Yan Zhou (HKUST)

Thanks to:

@gigasciencefacebook.com/GigaScience

blogs.biomedcentral.com/gigablog/

Peter LiHuayan Gao Chris HunterJesse Si ZheNicole NogoyLaurie GoodmanAmye Kenall (BMC)

Marco Roos (LUMC)Mark Thompson (LUMC)Jun Zhao (Lancaster)Susanna Sansone (Oxford)Philippe Rocca-Serra (Oxford) Alejandra Gonzalez-Beltran (Oxford)

www.gigadb.orggalaxy.cbiit.cuhk.edu.hk

www.gigasciencejournal.com

CBIITFunding from:

Our collaborators:team: Case study:

top related