Download - G3 talk rld_2

Transcript
Page 1: G3 talk rld_2

Tools for:

Open-SourceOpen-Data

Rob L Davidson about.me/rob.davidson

Page 2: G3 talk rld_2

The problem

reproducibility.cs.arizona.edu• 515 papers (429 conf, 86 journal) • <30% reproducible

Page 3: G3 talk rld_2

The problem

reproducibility.cs.arizona.edu

Page 4: G3 talk rld_2

The Cause

• Stodden 2010– 638 registrant at NIPS

• 30% share code

• 20% share data

http://web.stanford.edu/~vcs/papers/SMPRCS2010.pdf

Page 5: G3 talk rld_2

Publishers must provide!

Hosting

Curating

Citations for everything:

data, tools + workflows

Page 6: G3 talk rld_2

Tools for Reproducibility

• Data: GigaDB

• Images: OMERO• Workflows

– Galaxy – Executable Docs– VMs

Page 7: G3 talk rld_2

GigaDB

github.com/gigascience/gigadb-cogini

Page 8: G3 talk rld_2

Hosting all data

Page 9: G3 talk rld_2

Hosting all research objects

Page 10: G3 talk rld_2

Impact for research objects

• Host• Curate• Share

• Cite - DOI

Page 11: G3 talk rld_2

Even more accessible, transparent data?

Hosting image data with OMERO

Page 12: G3 talk rld_2

Hosting Images

• Image LIMS • Web embedding

– View online, noneed for software

• Full res• Link all images to

publication– No cherry picking

http://www.openmicroscopy.org/site/products/omero

Page 13: G3 talk rld_2

NO

Cyber-Centipedes! Phenotyping

Page 14: G3 talk rld_2

Accessible Cyber-Centipede images

OMERO: providingaccess to imaging data

View, filter, measure rawimages with direct linksfrom journal article.

See all image data, notjust cherry pickedexamples.

Download and reprocess.

Page 15: G3 talk rld_2

OMERO: Adding value

Page 16: G3 talk rld_2

The alternative...

...look but don't touch

Page 17: G3 talk rld_2

Workflows

1. Galaxy

galaxyproject.org

Page 18: G3 talk rld_2

galaxy.cbiit.cuhk.edu.hk

Page 19: G3 talk rld_2

Implement workflows in a community-acceptedformat

http://galaxyproject.org

Over 45,000 main Galaxy server users

Over 1,000 papersciting Galaxy use

Over 55 Galaxyservers deployed

Open source

Page 20: G3 talk rld_2

Copyright NBAF-B 2013Tool list Tool parameterisation Results panel

Implement workflows in an intuitive format

Page 21: G3 talk rld_2

Visualising Workflows

Page 22: G3 talk rld_2

Birmingham Metabo-Galaxy Workflow

Page 23: G3 talk rld_2

Birmingham Metabo-Galaxy

Tools wrapped in Python and XMLUser sees web form (easy!)Data stored centrally (secure!)Work done centrally (easy update)

Page 24: G3 talk rld_2
Page 25: G3 talk rld_2

Hosting Workflows

Page 26: G3 talk rld_2

Hosting Workflows

1) Test data2) Software files3) Instructions+ Galaxy implementation

Page 27: G3 talk rld_2

Can we reproduce results? SOAPdenovo2 S. aureus pipeline

Page 28: G3 talk rld_2

Workflows

2. Executable Docs

Page 29: G3 talk rld_2

Open lab books, dynamic documents• Facilitate reuse and sharing with tools like: Knitr, Sweave,

iPython Notebook

Sweave

• Working towards executable papers…

Page 30: G3 talk rld_2

E.g.

Page 31: G3 talk rld_2

E.g.

Page 32: G3 talk rld_2

Some testimonials for KnitrAuthors (Wolfgang Huber)

“I do all my projects in Knitr. Having the textualexplanation, the associated code and the results all in oneplace really increases productivity, and helps explainingmy analyses to colleagues, or even just to my future self.”

Reviewers (Christophe Pouzat)

“It took me a couple of hours to get the data, the fewcustom developed routines, the “vignette” and toREPRODUCE EXACTLY the analysis presented in themanuscript. With few more hours, I was able to modify theauthors’ code to change their Fig. 4. In addition to makingthe presented research trustworthy, the reproducibleresearch paradigm definitely makes the reviewer’s jobmuch more fun!

Page 33: G3 talk rld_2

Workflow accessibility:

VMs

Page 34: G3 talk rld_2

Why VMs?

• OS settings• Dependencies

– Versions– e.g. python!

• Data + Code linked• Download or run in

cloud

Page 35: G3 talk rld_2

VMs in GigaDB

Page 36: G3 talk rld_2

Summary

Page 37: G3 talk rld_2

Share data in GigaDB

Share all images in GigaDB

-View images via OMERO

Share code in GigaDB!

Share pipeline using:

Executable docs!

Galaxy!

VMs!

Page 38: G3 talk rld_2

Give us data, papers& pipelines*

Improvereproducibility!

[email protected] [email protected] [email protected]

Contact us:

* APC’s currently generously coveredby BGI until 2015

www.gigasciencejournal.com

Page 39: G3 talk rld_2

Ruibang Luo (BGI/HKU)Shaoguang Liang (BGI-SZ)Tin-Lap Lee (CUHK)Qiong Luo (HKUST)Senghong Wang (HKUST)Yan Zhou (HKUST)

Thanks to:

@gigasciencefacebook.com/GigaScienceblogs.biomedcentral.com/gigablog/

Peter LiHuayan Gao Chris HunterJesse Si ZheNicole NogoyLaurie GoodmanAmye Kenall(BMC)

Marco Roos (LUMC)Mark Thompson (LUMC)Jun Zhao (Lancaster)Susanna Sansone (Oxford)Philippe Rocca-Serra (Oxford) Alejandra Gonzalez-Beltran(Oxford)

www.gigadb.orggalaxy.cbiit.cuhk.edu.hk

www.gigasciencejournal.com

CBIITFunding from:

Our collaborators:team: Case study:


Top Related