the galaxy toolshed

Download The Galaxy toolshed

If you can't read please download the document

Upload: joachim-jacob

Post on 16-Apr-2017

1.438 views

Category:

Science


0 download

TRANSCRIPT

The Galaxy ToolShed

The software repository of Galaxy

Galaxy is an interface to a system

cpustoragebinarieslibraries

GALAXY

GUI

Toolshed: software repository

In the long term: empty galaxy and with installation install wanted tools, on a user basis.

Toolshed: get your own?...

Code is part of main distribution

./run_tool_shed.sh

Very easy to have it run locally...

Code is shared through hg

Galaxy main code on Bitbucket

hg

GalaxyserverToolshed+

Toolshed: run your own?

Toolshed is completely separate process to Galaxy

Uses it's own pg database: need to create a new user account

Files of toolshed need to be stored separate next to Galaxy root

Sharing a tool is basically simple

All you have to share is (if it's a simple script):

tool_conf.xmltool.pl

This can be distributed using the Tool Shed

Dependencies have to be installed separately

Sharing through the toolshed

Galaxy moves to installing everything through the Tool Shed: see shed_tool_conf.xml

toolshed.g2.bx.psu.edu sed_wrapper bjoern-gruening e850a63e5aed toolshed.g2.bx.psu.edu/repos/bjoern-gruening/sed_wrapper/sed_stream_editor/0.0.1 0.0.1

Tasks of the toolshed

Communicate with any Galaxy that wants to install a tool from it (Galaxy admin that accepts the tool needs to add your Toolshed)

Periodically runs functional tests on the Tools

Allow people to update the tools

Codevelop tools

Philosophy: task of a tool

Some functionality is encoded redundantly in tools.

An example is visualising data: some call R, some call GNUplot.

I really think that the preferred output of Galaxy needs to be text. An an versatile strong visualisation tool can draw then graphs as needed from the output.

(PNG, PDF and other visual formats are supported.)

BTW: the 2 different repository types comply with this view,

My original aim...

Prod galtest gal

Dev 1Dev 2Dev 3

Tool Shed(BITS?)

UpdateOffic. gal-dist

'Official' advise

Run Galaxy and toolshed locally

Develop your tool in your local Galaxy

If everything runs, wrap it up as .tar

Upload everything to Toolshed of your choice.

Test download in a test Galaxy from the Toolshed

Debug...

Do not use the toolshedAs a development environment

All code is shared through hg

Galaxy main code on Bitbucket

hg

GalaxyserverToolshedserver+

All code is shared through hg

Galaxy main code on Bitbucket

hg

GalaxyserverToolshedserver+

FancyTool (hg repo)

SuperTool (hg repo)

PowerTool (hg repo)

Your uploaded .tar balls

Code is shared through hg

Code is shared through hg

Tips

To test installation: empty your local toolbox

What is mercurial?

version/source control system

Without mercurial

What is mercurial?

version/source control system

Without mercurial

What is mercurial?

version/source control system

Without mercurial: continuously adding changes.

What is mercurial?

version/source control system

With mercurial: fix certain states of your file

commits

What is mercurial?

1. keep track of the changes YOU do on your files, scripts, folders,...

joachim@joachim-laptop:~/Projects/hgprojects$ hg logchangeset: 2:726fa53bcd7dtag: tipuser: Joachim Jacob date: Fri Nov 16 11:24:09 2012 +0100summary: Third change, playing with copy and remove

changeset: 1:744894cb4ee6user: Joachim Jacob date: Fri Nov 16 11:09:49 2012 +0100summary: I have added a small change to hello.txt

changeset: 0:b84e0105967fuser: Joachim Jacob date: Fri Nov 16 11:08:01 2012 +0100summary: Initial commit to test

What is mercurial?

You can go back to a previous revision (e.g. hg update 2). You can do some changes to the files (creating multiple heads)

head

head

What is mercurial?

You can go back to a previous revision. You can do some changes to the files.

joachim@joachim-laptop:~/Projects/hgprojects$ hg update 11 files updated, 0 files merged, 3 files removed, 0 files unresolvedjoachim@joachim-laptop:~/Projects/hgprojects$ nano hello.txt joachim@joachim-laptop:~/Projects/hgprojects$ hg commit -m "Bug fix"created new headjoachim@joachim-laptop:~/Projects/hgprojects$ hg summaryparent: 3:2d1d80bd0124 tip Bug fixbranch: defaultcommit: (clean)update: 1 new changesets, 2 branch heads (merge)

What is mercurial?

When done a change, you can merge the heads together again in one tip.

joachim@joachim-laptop:~/Projects/hgprojects$ hg mergemerging hello.txt and another.txt to another.txtmerging hello.txt and mvtest.txt to mvtest.txt1 files updated, 2 files merged, 0 files removed, 0 files unresolved(branch merge, don't forget to commit)

merge

What is mercurial?

When done a change, you can merge the heads together again in one tip.

joachim@joachim-laptop:~/Projects/hgprojects$ hg commit -m 'Commit the bug fix permanently'

commit

In case of conflicts, use 'hg resolve --list' to view the conflicting files. Fix them by hand.

What is mercurial?

1. keep track of the changes YOU do on your files, scripts, folders,...2. clone your working directory to a new directory (e.g. to work on another feature).

clone

What is mercurial?

You can compare two different repositories with incoming.If you want to merge the changes, you can use pull.

incoming

What is mercurial?

You can compare two different repositories with incoming.If you want to merge the changes, you can use pull.

pull

What is mercurial?

You can compare two different repositories with incoming.If you want to merge the changes, you can use pull.

merge

What is mercurial?

Hg commit to fix the change!

commit

What is mercurial?

So, in your directory, OR you change/add yourself filesOR mercurial does this for you (during a merge) (undo with 'rollback')Both need to be followed by a commit.

What is mercurial?

1. keep track of the changes YOU do on your files, scripts, folders,...2. clone your working directory to a new directory (e.g. to work on another feature).3. Share changes with other users.

Sharing in mercurial?

The directories might be located - on local directories:- on your intranet (hg serve):- on the internet:

You can also export a commit, send it through email, and import it.You can also set up an push repository online on BitBucket.

pull /path/to/directory

pull http://10.10.10.100:8000

pull hg clone http://[email protected]/repos/joachim/clcaligner

What is mercurial?

Guide!http://mercurial.selenic.com/guide/http://hginit.com/

Galaxy Toolshed

Galaxy Toolshed contains a bunch of Mercurial repositories you can clone

Getting ready for Galaxy development

How I develop for Galaxy:

Getting ready for Galaxy development

How I develop for Galaxy:

template

Set tool name

Toolshed

upload

hg clone

Dev Galaxy

hg push

Getting ready for Galaxy development

And the last step:

template

Set tool name

Toolshed

upload

hg clone

Dev Galaxy

hg push

Galaxy.bits.vib.be

How I develop for Galaxy:- you need a personal Galaxy (hg clone )- you might use a Toolshed repository

1. Get a template (right): a tar ball with some files.

Getting ready for Galaxy development

README

tool_data_table_conf.xml.sample

tool_dependencies.xml

tool_indices.loc.sample

tool_wrapper_template.pl

tool_wrapper.xml

2. Rename the files: - replace 'tool' with your tool name

[galaxy@joagal razers]$ lsrazers3_wrapper.xml README tool_data_table_conf.xml.sample tool_indices.loc.sample tool_wrapper_template.pl

Getting ready for Galaxy development

3. Edit the wrapper.xml: the section.

Getting ready for Galaxy development

4. Pack again everything in a tarball and upload to the test Toolshed in a new repository

Getting ready for Galaxy development

4. Pack again everything in a tarball and upload to the test Toolshed in a new repository

Getting ready for Galaxy development

5. hg clone your repository to a folder in your development Galaxy.

Getting ready for Galaxy development

5. hg clone your repository to a folder in your development Galaxy.

Getting ready for Galaxy development

[galaxy@joagal GalaxyHangar]$ hg clone http://[email protected]:9009/repos/joachim/fastqseqlendestination directory: fastqseqlenrequesting all changesadding changesetsadding manifestsadding file changesadded 1 changesets with 2 changes to 2 filesupdating to branch defaultresolving manifestsgetting READMEgetting fastqseqlen.xml2 files updated, 0 files merged, 0 files removed, 0 files unresolved

5. hg clone your repository to a folder in your development Galaxy.

Getting ready for Galaxy development

[galaxy@joagal GalaxyHangar]$ cd fastqseqlen/[galaxy@joagal fastqseqlen]$ lsfastqseqlen.xml README[galaxy@joagal fastqseqlen]$ [galaxy@joagal fastqseqlen]$ hg summaryparent: 0:3f22736718ef tip Uploaded filesbranch: defaultcommit: (clean)update: (current)[galaxy@joagal fastqseqlen]$

6. Link the complete directory to a directory under $GALAXY_HOME/tools/ and make Galaxy aware of it by modifying tool_conf.xml

Getting ready for Galaxy development

7. (re)start your Galaxy

$ ./run.sh reload

And check if tool loads:

Getting ready for Galaxy development

8. Get your tools parameters display straight:

Fill the rest of the tool's XML file.

Add also the loc.file (which contains your reference data) if needed.

(when modifying the XML, to see the changes you have to restart Galaxy. Kill Galaxy and run ./run.sh --reload again.

Getting ready for Galaxy development

9. Fun! Start developing your tool

Development happens in the development Galaxy, committing changes from time to time (evt. with pushing to Toolshed)

Starting Galaxy tools development

$ hg commit -m "Alpha version \ of RazerS3 wrapper"$ hg push --debug$ hg commit -m "Some small \ changes"$ hg push --debug

Mercurial credentials should be stored in ~/.hgrc (hgrc.ini for windows)

[ui]username = "joachim "verbose=True[extensions]hgext.graphlog =[auth]bb.prefix = http://192.168.10.26:9009/repos/joachim/razersbb.username = joachimbb.password = ********

Starting Galaxy tools development

When development is ready...

Push the last changes to the Galaxy test Toolshed.

Export from the Galaxy Test Toolshed and import in BITS Toolshed. Install in Galaxy.bits.vib.be

When development is ready...

Push the last changes to the Galaxy test Toolshed.

Export from the Galaxy Test Toolshed and import in BITS Toolshed. Install in Galaxy.bits.vib.be

Galaxy manages scripts (tools)

1. Galaxy knows the location of tools, as this is set in (an) xml file(s)

2. The tool referenced by an xml file can be- a script that does all calculations by itself (e.g. bash script, python script,...)- a script that does calculations by using 3rd party libraries (e.g. R)- a script that does calculations by calling a 3rd party binary

4 different XML files

integrated_tool_panel.xml - layout of panel

shed_tool_conf.xml - tools from shed

tool_conf.xml - tools from install or own

migrated_tools_conf.xml : tools removed from tool_conf.xml upon updating.

Noot: deze xml files zijn pas in voege na de laatste update!

Galaxy installation directory

Galaxy is installed as the user galaxy /home/galaxy/galaxy-dist

Installation and Version control of this directory is done by Mercurial (config in .hg directory, file .hgignore to ignore updating certain files)

Installation for production required some changes: PostgresDB, apache serving static content, network settings, running galaxy as a daemon in the background

http://wiki.g2.bx.psu.edu/Admin/Get%20Galaxy

Galaxy installation directory

Galaxy is installed on linux as the user galaxy in /home/galaxy/galaxy-dist

Important locations under this directory: - universe_wsgi.ini general config file - *.xml 'embedding' of tools and types- tools/ location of the scripts- database/ location of the datasets

http://wiki.g2.bx.psu.edu/Admin/Get%20Galaxy

integrated_tool_panel.xml

Is door Galaxy samengesteld van shed_tool_conf.xml en tool_conf.xml. De ID van een tool verwijst naar de ID value in de andere *.xml files. ALEEN aan te passen bij wijzigen positie in het toolpaneel

tool_conf.xml

Is door ontwikkelaars aan te passen voor het toevoegen van nieuwe tools: hierbij verwijs je naar de locatie, startend vanaf de tools directory (tools/, uit universe_wsgi.ini), van de tool xml.

tool.xml, the tool definition file

fasta_compute_length.py $input $output $keep_first

Elke tool heeft een xml, dat verwijst naar het script, dat de interface opbouwt en parameters naar de tool zendt.

Tool interface is build from xml

The tool XML points to a script

./tools/fasta_tools/fasta_compute_length.py :

#!/usr/bin/env python"""Uses fasta_to_len converter code."""

import sysfrom galaxy.datatypes.converters.fasta_to_len import compute_fasta_length

compute_fasta_length( sys.argv[1], sys.argv[2], sys.argv[3])

In dit geval vindt de berekening plaats in python zelf. Soms moeten echter 3rd parties libraries geinstalleerd worden.

The tool XML points to a binary

#!/usr/bin/env python

"""Runs BWA on single-end or paired-end data.Produces a SAM file containing the mappings.Works with BWA version 0.5.9.

usage: bwa_wrapper.py [options]

See below for options"""

import optparse, os, shutil, subprocess, sys, tempfile

def stop_err( msg ): sys.stderr.write( '%s\n' % msg ) sys.exit()

def check_is_double_encoded( fastq ): # check that first read is bases, not one base followed by numbers bases = [ 'A', 'C', 'G', 'T', 'a', 'c', 'g', 't', 'N' ] nums = [ '0', '1', '2', '3' ] for line in file( fastq, 'rb'): if not line.strip() or line.startswith( '@' ):

Options for building interfaces

Overview of the tags onhttp://wiki.g2.bx.psu.edu/Admin/Tools/Tool%20Config%20Syntax

The parameters to construct the interface are placed within tags

The tags you use in the section define a lot the syntax to use in other tag sets, such as , and

BASIC USE

e.g.

Select a dataset from history

If the type of input=data, a dropdown list of history items appear.The accepted format should be included as format=format.

Choose from a list

0.001 0.002 0.003 0.004

Select reference data

For some tools indexed data can be made available (e.g. BLAST, NGS mappers, ). To pass indexed sets, they can be referenced to by tool_data_table_conf.xml: they point to ./tool_data/.loc files

Select reference data

./tool_data_table_conf.xml:

value, dbkey, name, path

./tool_data/.loc

hg19_chr21hg19Human chrom 21 bld 37 (hg19) /mnt/genomes/hg19_chrom21/bwa/base/build37_chr21.fahg18hg18Human genome bld 36 (hg18) /mnt/genomes/hg18/bwa/base/build36.fahg19hg19Human genome bld 37 (hg19) /mnt/genomes/hg19/bwa/base/build37.fa

The reference data is on a disk mounted on /mnt/genomes

Select reference data

The reference data is on a disk mounted on /mnt/genomes

/mnt/genomes/ (800GB)|-- hg18| |-- bfast| |-- bowtie| |-- bwa|-- hg19| |-- bfast| |-- bowtie| |-- bwa

Other useful input: conditional

No Yes

Other useful input: conditional

conditional

Output section

It is the easiest that your script can accept the name of the output file to output the results to. The effective output file names are then passed by Galaxy to your program.

myscript.pl -i $input -o $trim_fasta

Important: set the format to the correct type!

Optional output files: can be handled with the tag set, and linking it to the tag in the output sets.

How to integrate a tool?

You have: a script that accepts parameters and writes the results to a text file.

TODO1. put your script in ~/galaxy-dist/tools/mytools/2. in that directory, create a mytool.xml file, pointing to that tool, with all tag sets set correctly.3. in ~/galaxy-dist/tool_conf.xml, enter a line with your tool xml file4. restart galaxy: #service galaxyd restart(4'. optional: change the location of your tool in integrated_tool_panel.xml and restart again) 5. There's the magic. Enjoy your tool !

Wrapping Binaries

Things get a bit difficult with wrapper scripts: scripts that drive a third party binary, which needs to be available on the system. I have installed 3rd party binaries in:

/opt

(In one case, I found myself writing a python script, to drive a 3rd party bash script, that consecutively executed a JAVA binary and an R command, to generate a PDF document. The correct implementation: execute the JAVA binary, generate text. Let visualisation tools in Galaxy generate graphs)

Tool dependencies

Some tools in the Toolshed require common code base: e.g. R, samtools, GATK

In your .xml you specify these requirements:

Tool dependencies

In your .xml these requirements must match the tool_dependencies.xml

Tool dependencies

In your .xml these requirements must match the tool_dependencies.xml

Tool dependencies

In your .xml these requirements must match the tool_dependencies.xml

Tool_dependencies.xml

1, define a dependency as repository of a toolshed containin a tool dependency definition type

2, or write directly in the tool_dependencies.xml the instructions to install the dependency, and make it available system wide.

Galaxy aims to be platform independent, so A HELL OF A JOB.

http://wiki.galaxyproject.org/ToolShedToolFeatures#Automatic_third-party_tool_dependency_installation_and_compilation_with_installed_repositories

Tool_dependencies.xml

This is the simplest you can get. Really.

Tool_dependencies.xml

A more complex example

Tool_dependencies.xml

A more complex example

Lesson 1

It pays of to use / build on repositories started by others.

The problem is the testing

1, build your tool and make it work in your galaxy2, define your dependencies3, search the (test)toolshed for repositories you can use tool dependency definitions (just installing packages, without providing an interface).4, put them as requirements in your tool.xml5, the ones you do not find: decide whether to create a separate tool dependency definition and integrate themOR5' add them to your dependencies.xml file.6' Update/Load to a Toolshed7' Fire up a test Galaxy, and plug the tool in to see whether it works.

The problem is the testing

You might consider a virtual test machine e.g. In Virtualbox.

1, install your OS2, fetch galaxy3, set the universe_wsgi.ini ready (admin, location,...)4, plug in your repository5, SNAPSHOT your machine6, graphically install your tool7, define what went wrong 7` update the repository7`` and restore the snapshot 8, interate until SUCCESS!

Tool dependencies

Dependencies

IGENOMES (http://support.illumina.com/sequencing/sequencing_software/igenome.ilmn)

gtf file:$IGENOMES_ROOT/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/genes.gtf

reference whole genome sequence:$IGENOMES_ROOT/Mus_musculus/Ensembl/GRCm38/Sequence/WholeGenomeFasta/

reference chromosome sequences:$IGENOMES_ROOT/Mus_musculus/Ensembl/GRCm38/Sequence/Chromosomes/

PHIX-control sequences:$IGENOMES_ROOT/Mus_musculus/Ensembl/GRCm38/Sequence/AbundantSequences/phix.fa

TopHat2 (Bowtie2) and STAR indexes:$IGENOMES_ROOT/Mus_musculus/Ensembl/GRCm38/Sequence/Bowtie2Index$IGENOMES_ROOT/Mus_musculus/Ensembl/GRCm38/Sequence/STARIndex

Chr size file:$IGENOMES_ROOT/Mus_musculus/Ensembl/GRCm38/Annotation/Genes/ChromInfo.txt
Binaries

STAR (https://code.google.com/p/rna-star/)TOPHAT2 (http://tophat.cbcb.umd.edu/)BLASTP (ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/) or USEARCH (http://www.drive5.com/usearch/download.html)R (http://www.r-project.org/)SAMTOOLS (http://sourceforge.net/projects/samtools/files/samtools/)GATK (http://www.broadinstitute.org/gatk/download)PICARD (http://sourceforge.net/projects/picard/files/picard-tools/)SQLITE3 (http://www.sqlite.org/download.html)

Custom Ensembl SQLite DB

tables included:coord_system exon_transcript intergene (made by the intergenic TIScalling script based on gene) transcript exon gene seq_region translation

Extra table with rRNA molecules still necessary (TODO)

Perl Packages

Parallel::ForkmanagerDBIDBD::SQLiteStorable 'dclone'Getopt::LongCwdLWP::UserAgentXML::SmartBioperl

data