developing an open source community for cloud bioinformatics

16
Developing an open source community for cloud bioinformatics Brad Chapman http://bcbio.wordpress.com/ 8 June 2010

Upload: brad-chapman

Post on 28-Aug-2014

2.529 views

Category:

Technology


1 download

DESCRIPTION

Talk for Amazon workshop: http://aws.amazon.com/genomics_workshop/

TRANSCRIPT

Page 1: Developing an open source community for cloud bioinformatics

Developing an open sourcecommunity for cloud

bioinformatics

Brad Chapmanhttp://bcbio.wordpress.com/

8 June 2010

Page 2: Developing an open source community for cloud bioinformatics

Overview

1 Building open source bioinformatics

communities is hard.

2 Developer resources are a productive

target.

3 Framework: collaborative software

images and data snapshots.

Page 3: Developing an open source community for cloud bioinformatics

Motivation

Open sourceOpenBio, BiopythonGraduate school – developed distributedalgorithm. Never reused.

WorkStartup: Automated biological pipelines.Research hospital: Democratization ofanalysis.

Page 4: Developing an open source community for cloud bioinformatics

Filters in biological computing

Working in same biological area

Interest in developing open source code

Technical abilities

Your software is good enough

Page 5: Developing an open source community for cloud bioinformatics

Successful bioinformatics

Sean Eddy, HMMER

...the best software in the field is often an

unplanned labor of love from a single

investigator.

http://selab.janelia.org/people/eddys/blog/?p=313

Page 6: Developing an open source community for cloud bioinformatics

Recognizing contributions

Page 7: Developing an open source community for cloud bioinformatics

Successful community projects

OpenBio: BioPerl, Biopython, BioJava

Bioconductor

Common themeAimed at developers.

Biologists benefit indirectly.

Page 8: Developing an open source community for cloud bioinformatics

Lowering activation energy

Page 9: Developing an open source community for cloud bioinformatics

Establishing common platform

=The solutionto all ourproblems

Remove install and distribution barriers

Building block for scaling

Page 10: Developing an open source community for cloud bioinformatics

Existing cloud bioinformatics work

JCVI Cloud BioLinux

bioperl-max

MachetEC2

Debian Med

Overlapping set of useful functionality.

Page 11: Developing an open source community for cloud bioinformatics

Integrated community solution

Inclusive but configurable

Easy to contribute

Automated

Bootstrap bare machine to fully ready

distributed AMI.

http://github.com/chapmanb/bcbb/tree/master/ec2/

biolinux/

Page 12: Developing an open source community for cloud bioinformatics

Inclusive but configurable

# Top level YAML configuration file specifying# groups of programs to be installed.packages:- python- r- erlang- databases- viz- bio_search- bio_alignment- bio_nextgen- bio_sequencing- bio_visualization- phylogeny

libraries:- r-libs- python-libs

Page 13: Developing an open source community for cloud bioinformatics

Easy to contribute

# Configuration file defining R specific libraries that# are installed via CRAN and Bioconductor.cranrepo: http://software.rc.fas.harvard.edu/mirrors/R/cran:- ggplot2- rjson- sqldf- NMF- ape

biocrepo: http://bioconductor.org/biocLite.Rbioc:- ShortRead- BSgenome- edgeR- GOstats- biomaRt- Rsamtools

Page 14: Developing an open source community for cloud bioinformatics

Automated

def install_biolinux():

ec2_ubuntu_environment()

pkg_install, lib_install = _read_main_config()

_apt_packages(pkg_install)

_do_library_installs(lib_install)

def _ruby_library_installer(config):

for gem in config[’gems’]:

sudo("gem install %s" % gem)

Fabric: http://docs.fabfile.org/

Page 15: Developing an open source community for cloud bioinformatics

Ready to use biological data

% ls /referenceGenomes/AthalianaCelegansDmelanogasterEcoliHsapiensMmusculusMsmegmatisMtuberculosis_H37RvPaeruginosa_UCBPP-PA14phiX174RnorvegicusScerevisiaeXtropicalis

% ls Hsapiens/hg18arachnebowtiebwaelandmaqseqsnpsucsc

http://github.com/chapmanb/bcbb/blob/master/galaxy/galaxy_fabfile.py

Page 16: Developing an open source community for cloud bioinformatics

Organization: Codefest 2010

www.open-bio.org/wiki/Codefest_2010