g3 talk rld_2

Post on 10-Jul-2015

59 Views

Category:

Science

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Tools for:

Open-SourceOpen-Data

Rob L Davidson about.me/rob.davidson

The problem

reproducibility.cs.arizona.edu• 515 papers (429 conf, 86 journal) • <30% reproducible

The problem

reproducibility.cs.arizona.edu

The Cause

• Stodden 2010– 638 registrant at NIPS

• 30% share code

• 20% share data

http://web.stanford.edu/~vcs/papers/SMPRCS2010.pdf

Publishers must provide!

Hosting

Curating

Citations for everything:

data, tools + workflows

Tools for Reproducibility

• Data: GigaDB

• Images: OMERO• Workflows

– Galaxy – Executable Docs– VMs

GigaDB

github.com/gigascience/gigadb-cogini

Hosting all data

Hosting all research objects

Impact for research objects

• Host• Curate• Share

• Cite - DOI

Even more accessible, transparent data?

Hosting image data with OMERO

Hosting Images

• Image LIMS • Web embedding

– View online, noneed for software

• Full res• Link all images to

publication– No cherry picking

http://www.openmicroscopy.org/site/products/omero

NO

Cyber-Centipedes! Phenotyping

Accessible Cyber-Centipede images

OMERO: providingaccess to imaging data

View, filter, measure rawimages with direct linksfrom journal article.

See all image data, notjust cherry pickedexamples.

Download and reprocess.

OMERO: Adding value

The alternative...

...look but don't touch

Workflows

1. Galaxy

galaxyproject.org

galaxy.cbiit.cuhk.edu.hk

Implement workflows in a community-acceptedformat

http://galaxyproject.org

Over 45,000 main Galaxy server users

Over 1,000 papersciting Galaxy use

Over 55 Galaxyservers deployed

Open source

Copyright NBAF-B 2013Tool list Tool parameterisation Results panel

Implement workflows in an intuitive format

Visualising Workflows

Birmingham Metabo-Galaxy Workflow

Birmingham Metabo-Galaxy

Tools wrapped in Python and XMLUser sees web form (easy!)Data stored centrally (secure!)Work done centrally (easy update)

Hosting Workflows

Hosting Workflows

1) Test data2) Software files3) Instructions+ Galaxy implementation

Can we reproduce results? SOAPdenovo2 S. aureus pipeline

Workflows

2. Executable Docs

Open lab books, dynamic documents• Facilitate reuse and sharing with tools like: Knitr, Sweave,

iPython Notebook

Sweave

• Working towards executable papers…

E.g.

E.g.

Some testimonials for KnitrAuthors (Wolfgang Huber)

“I do all my projects in Knitr. Having the textualexplanation, the associated code and the results all in oneplace really increases productivity, and helps explainingmy analyses to colleagues, or even just to my future self.”

Reviewers (Christophe Pouzat)

“It took me a couple of hours to get the data, the fewcustom developed routines, the “vignette” and toREPRODUCE EXACTLY the analysis presented in themanuscript. With few more hours, I was able to modify theauthors’ code to change their Fig. 4. In addition to makingthe presented research trustworthy, the reproducibleresearch paradigm definitely makes the reviewer’s jobmuch more fun!

Workflow accessibility:

VMs

Why VMs?

• OS settings• Dependencies

– Versions– e.g. python!

• Data + Code linked• Download or run in

cloud

VMs in GigaDB

Summary

Share data in GigaDB

Share all images in GigaDB

-View images via OMERO

Share code in GigaDB!

Share pipeline using:

Executable docs!

Galaxy!

VMs!

Give us data, papers& pipelines*

Improvereproducibility!

scott@gigasciencejournal.com editorial@gigasciencejournal.com database@gigasciencejournal.com

Contact us:

* APC’s currently generously coveredby BGI until 2015

www.gigasciencejournal.com

Ruibang Luo (BGI/HKU)Shaoguang Liang (BGI-SZ)Tin-Lap Lee (CUHK)Qiong Luo (HKUST)Senghong Wang (HKUST)Yan Zhou (HKUST)

Thanks to:

@gigasciencefacebook.com/GigaScienceblogs.biomedcentral.com/gigablog/

Peter LiHuayan Gao Chris HunterJesse Si ZheNicole NogoyLaurie GoodmanAmye Kenall(BMC)

Marco Roos (LUMC)Mark Thompson (LUMC)Jun Zhao (Lancaster)Susanna Sansone (Oxford)Philippe Rocca-Serra (Oxford) Alejandra Gonzalez-Beltran(Oxford)

www.gigadb.orggalaxy.cbiit.cuhk.edu.hk

www.gigasciencejournal.com

CBIITFunding from:

Our collaborators:team: Case study:

top related