bioinformatics alchemy 101 transmuting dark script matter into reusable tools - ross lazarus

23
1 Bioinformatic Alchemy 101 Transmuting dark script matter into reusable tools Ross Lazarus BakerIDI

Upload: australian-bioinformatics-network

Post on 10-May-2015

332 views

Category:

Real Estate


0 download

DESCRIPTION

Reproducibility is a fundamental goal of good experimental science. Despite the increasing availability and deployment of analytic frameworks such as Galaxy, readily reproducible bioinformatic analysis remains difficult to achieve. Mature complex workflows often require small tweaks to accommodate the idiosyncracies of new datasets, but integrating the required new capabilities into the framework is prohibitively complex and expensive. As a result, when problems are encountered in an existing pipeline, data may be temporarily diverted for manual processing outside the framework. These manual steps typically involve relatively trivial, transient, undocumented and poorly curated programs or scripts - "dark script matter" that rarely reaches appropriate local version control or archiving systems where production code is maintained, threatening the goal of reproducible analysis. The Galaxy Toolfactory is a Galaxy tool that allows scripts (R, perl, python, Bash...) to be run directly and repeatably through the normal Galaxy interface. The Toolfactory optionally generates all the biolerplate code needed for a new Galaxy tool that permanently wraps the script for reuse. Newly generated tools can be uploaded to a local or remote Galaxy Toolshed. Tools can be installed in a running Galaxy server from any Toolshed through the administrative interface for subsequent use in worflows and analyses. The conversion of a trivial script into a working, shareable Galaxy tool will be demonstrated during the presentation.

TRANSCRIPT

Page 1: Bioinformatics alchemy 101   transmuting dark script matter into reusable tools - ross lazarus

1

Bioinformatic Alchemy 101

Transmuting dark script

matter into reusable tools

Ross Lazarus

BakerIDI

Page 2: Bioinformatics alchemy 101   transmuting dark script matter into reusable tools - ross lazarus

2

Context: bioinformatic analyses

Big data; complex analyses

Repeatable, automated pipelines

Reproducibility real goal

Reproducibility is hard

Page 3: Bioinformatics alchemy 101   transmuting dark script matter into reusable tools - ross lazarus

3

Frameworks

Eg VGL

Local SOPs for biologists

Tools, canned workflows

Minimise opportunities for error

Maximise reproducibilty

Page 4: Bioinformatics alchemy 101   transmuting dark script matter into reusable tools - ross lazarus

4

In real life

90/10 rule

Need to tweak SOPs

Trivial 'disposable' scripts

Not documented or curated

Not reliably available to re-run

“Dark script matter”

Page 5: Bioinformatics alchemy 101   transmuting dark script matter into reusable tools - ross lazarus

5

Dark Script Matter

Outside usual VCS/pipelines

Manual =/= reproducible

Necessary evil?

Platform extensions complex

Eg Galaxy – hours of work

Page 6: Bioinformatics alchemy 101   transmuting dark script matter into reusable tools - ross lazarus

6

Plan

Context: Reproducible analyses

Frameworks vs Dark Scripts

Alchemy: script to Galaxy

tool Demonstration

Summary

Conclusions

Page 7: Bioinformatics alchemy 101   transmuting dark script matter into reusable tools - ross lazarus

7

Galaxy Tool Factory

An installable Galaxy tool

Runs scripts: Python,R,Perl,sh

Generates new Galaxy tools

Tool code wraps the script

Minutes – not hours

Page 8: Bioinformatics alchemy 101   transmuting dark script matter into reusable tools - ross lazarus

8

Galaxy Tool Shed

Separate server

Stores/serves Galaxy tools

Admin can install to Galaxy

Mercurial VCS archives

Explicit tool versioning

Sharing and reproducibility

Page 9: Bioinformatics alchemy 101   transmuting dark script matter into reusable tools - ross lazarus

Demo 1: Install the Tool Factory

Page 10: Bioinformatics alchemy 101   transmuting dark script matter into reusable tools - ross lazarus

Demo 2: Create a new tool

Page 11: Bioinformatics alchemy 101   transmuting dark script matter into reusable tools - ross lazarus

11

Prepare script

Python; R; Perl; Sh

Parse CL params – 1=in, 2=out

Typically workflow transformations

Arbitrary complexity

Simple example

Write transpose of a tabular file

Page 12: Bioinformatics alchemy 101   transmuting dark script matter into reusable tools - ross lazarus

12

Prepare/upload test data

SMALL sample input

Becomes functional test case

h1 h2 h3 h4

r11 r12 r13 r14

r21 r22 r23 r24

r31 r32 r33 r34

Page 13: Bioinformatics alchemy 101   transmuting dark script matter into reusable tools - ross lazarus

13

# R transpose a tabular input file and write as

# a tabular output file

ourargs = commandArgs(TRUE)

inf = ourargs[1]

outf = ourargs[2]

inp = read.table(inf,head=F,row.names=NULL,sep='\t')

outp = t(inp)

write.table(outp,outf,quote=FALSE, sep="\t",row.names=F,col.names=FALSE)

Page 14: Bioinformatics alchemy 101   transmuting dark script matter into reusable tools - ross lazarus

14

Demo part 1

As an admin, test run the code

Can't make a new tool until it works!

Admin only real time scripting in Galaxy.

Overrides ALL other security.

Generated tools run with normal security.

Page 15: Bioinformatics alchemy 101   transmuting dark script matter into reusable tools - ross lazarus

15

Use Redo button; Generate

When working right

Use Redo to save retyping

Select Generate option

Provide tool ID, help text

Execute

Expect a toolfactory.gz in history

Copy link (floppy disk icon)

Page 16: Bioinformatics alchemy 101   transmuting dark script matter into reusable tools - ross lazarus

16

What's in the toolshed.gz ?

A gzip'd mercurial tool repository (!)

Auto generated tool XML file

Auto generated tool python wrapper

Functional test case - the sample data

Familiar Galaxy tool for all users

Executes your script over their data

Interoperably inside Galaxy

Page 17: Bioinformatics alchemy 101   transmuting dark script matter into reusable tools - ross lazarus

17

Upload TS gzip to new repository

Upload to any tool shed

Create new repo; sensible name!

Choose Upload files to new repo

Paste URL (floppydisk save icon)

New tool ready to install

Page 18: Bioinformatics alchemy 101   transmuting dark script matter into reusable tools - ross lazarus

18

Install and Test New Tool

Back to Galaxy admin interface

Browse local tool shed

Choose new tool

Install to local Galaxy

Try it out

Run functional test

Page 19: Bioinformatics alchemy 101   transmuting dark script matter into reusable tools - ross lazarus

19

Summary

GTF = script to tool in minutes

Integrated with Galaxy and TS

Simple workflow components

If needed, generate simple tool

Then add parameters manually

Page 20: Bioinformatics alchemy 101   transmuting dark script matter into reusable tools - ross lazarus

20

Tool Factory Operation Guide

Script

(Python,R,

perl, sh)

Galaxy Tool Factory

Tool Form;

Paste script;

Generate TS gzip;

Copy download link for

pasting

Upload/paste

Sample Input for

functional test Test run;

Check outputs;

Rerun/fix;

Tool Shed

Create new repository.

Upload files – paste TS gzip

link and upload

Install new tool from toolshed

from Galaxy admin page;

Test; Functional test;

Page 21: Bioinformatics alchemy 101   transmuting dark script matter into reusable tools - ross lazarus

21

GALAXY

http://usegalaxy.org

Page 22: Bioinformatics alchemy 101   transmuting dark script matter into reusable tools - ross lazarus

22

Generate a new Galaxy tool

Galaxy Tool Factory

From a python, R, Perl or bash script

# transpose a tabular input file and write as a tabular output file

ourargs = commandArgs(T)

inf = ourargs[1]

outf = ourargs[2]

inp = read.table(inf,head=F,row.names=NULL,sep='\t')

outp = t(inp)

write.table(outp,outf,quote=F, sep="\t",row.names=F,col.names=F)

Using a Galaxy tool

Via a Tool Shed

Page 23: Bioinformatics alchemy 101   transmuting dark script matter into reusable tools - ross lazarus

23

Tool Factory Operation Guide

Script – R,

perl, python

Galaxy Tool Factory

Tool Form;

Paste script;

Generate TS gzip;

Copy download link for

pasting

Upload/paste

Sample Input for

functional test Test run;

Check outputs;

Rerun/fix;

Tool Shed

Create new repository.

Upload files – paste TS gzip

link and upload

Install new tool from toolshed

from Galaxy admin page;

Test; Functional test;