developing an open source community for cloud bioinformatics
DESCRIPTION
Talk for Amazon workshop: http://aws.amazon.com/genomics_workshop/TRANSCRIPT
Developing an open sourcecommunity for cloud
bioinformatics
Brad Chapmanhttp://bcbio.wordpress.com/
8 June 2010
Overview
1 Building open source bioinformatics
communities is hard.
2 Developer resources are a productive
target.
3 Framework: collaborative software
images and data snapshots.
Motivation
Open sourceOpenBio, BiopythonGraduate school – developed distributedalgorithm. Never reused.
WorkStartup: Automated biological pipelines.Research hospital: Democratization ofanalysis.
Filters in biological computing
Working in same biological area
Interest in developing open source code
Technical abilities
Your software is good enough
Successful bioinformatics
Sean Eddy, HMMER
...the best software in the field is often an
unplanned labor of love from a single
investigator.
http://selab.janelia.org/people/eddys/blog/?p=313
Recognizing contributions
Successful community projects
OpenBio: BioPerl, Biopython, BioJava
Bioconductor
Common themeAimed at developers.
Biologists benefit indirectly.
Lowering activation energy
Establishing common platform
=The solutionto all ourproblems
Remove install and distribution barriers
Building block for scaling
Existing cloud bioinformatics work
JCVI Cloud BioLinux
bioperl-max
MachetEC2
Debian Med
Overlapping set of useful functionality.
Integrated community solution
Inclusive but configurable
Easy to contribute
Automated
Bootstrap bare machine to fully ready
distributed AMI.
http://github.com/chapmanb/bcbb/tree/master/ec2/
biolinux/
Inclusive but configurable
# Top level YAML configuration file specifying# groups of programs to be installed.packages:- python- r- erlang- databases- viz- bio_search- bio_alignment- bio_nextgen- bio_sequencing- bio_visualization- phylogeny
libraries:- r-libs- python-libs
Easy to contribute
# Configuration file defining R specific libraries that# are installed via CRAN and Bioconductor.cranrepo: http://software.rc.fas.harvard.edu/mirrors/R/cran:- ggplot2- rjson- sqldf- NMF- ape
biocrepo: http://bioconductor.org/biocLite.Rbioc:- ShortRead- BSgenome- edgeR- GOstats- biomaRt- Rsamtools
Automated
def install_biolinux():
ec2_ubuntu_environment()
pkg_install, lib_install = _read_main_config()
_apt_packages(pkg_install)
_do_library_installs(lib_install)
def _ruby_library_installer(config):
for gem in config[’gems’]:
sudo("gem install %s" % gem)
Fabric: http://docs.fabfile.org/
Ready to use biological data
% ls /referenceGenomes/AthalianaCelegansDmelanogasterEcoliHsapiensMmusculusMsmegmatisMtuberculosis_H37RvPaeruginosa_UCBPP-PA14phiX174RnorvegicusScerevisiaeXtropicalis
% ls Hsapiens/hg18arachnebowtiebwaelandmaqseqsnpsucsc
http://github.com/chapmanb/bcbb/blob/master/galaxy/galaxy_fabfile.py