an introduction to designing and executing workflows with taverna katy wolstencroft university of...

28
An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

Upload: lucas-james

Post on 30-Dec-2015

228 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

An Introduction to Designing and Executing Workflows with Taverna

Katy Wolstencroft

University of Manchester

Page 2: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

This tutorial will give you a basic introduction to designing, and reusing workflows in Taverna and some of its main features.

Workflows in this practical use small data-sets and are designed to run in a few minutes. In the real world, you would be using larger data sets and workflows would typically run for longer

Page 3: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

Exercise 1: Exploring the Workbench

Taverna can be downloaded from http://www.taverna.org.uk/

Go to the page and find the latest (2.4) Follow the instructions on the website to install Taverna

for your operating system (this is a simple one-click install for windows and Mac. For Linux, you may also need the GraphViz program. Follow the link on the Taverna download page if so)

The following page shows a screenshot of Taverna and the different panels that make up the workbench

Page 4: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

Taverna Workbench

Workflow DiagramServices Panel

Workflow Explorer

Page 5: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

1. Workflow Diagram

The workflow diagram is the visual representation of the workflow, it:

Shows inputs, outputs, services and data flows Allows editing of the workflow by dragging and dropping

and connecting services together Enables saving of workflow diagrams for publishing and

sharing

Page 6: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

1. Workflow Explorer

The Workflow Explorer shows the detailed view of your workflow. It shows default values and descriptions for service inputs and outputs and it shows where remote services are located. It also shows configuration details, such as iteration and looping

Workflow validation details can also be found here. Before a workflow is run, Taverna checks to see if it is connected correctly and if its services are available.

Page 7: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

1. Available Services Panel

Lists services available by default in Taverna Local java services WSDL Web Service – secure and public RESTful Services R Processor services (for statistical analyses) Beanshell scripts Xpath scripts Spreadsheet import service

The services panel also allows you to add new services or workflows from the web or from file systems – there are loads more available!

Page 8: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

We will start with something easy - retrieving a protein sequence from a remote database and identifying functional motifs

Go to the Services Panel Type ‘Fasta’ into the ‘search’ box at the top of the panel You will see several services in the search results Select ‘Get Protein FASTA’ and drag-and-drop it into the

workflow diagram panel.

Exercise 2: Building a Simple Workflow

Page 9: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

Exercise 2: Building a Simple Workflow

In a blank space in the workflow diagram, right-click and select “Workflow input port” from the “Insert” section

Type in a name for this input (e.g. ID) and click “ok”

Do the same to create a new workflow output. Call this output “sequence”

Page 10: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

Exercise 2: Building a Simple Workflow

You now have 3 boxes in the diagram and we need to connect them up

Click on the input box and drag towards “Get Protein Fasta” and let go. An arrow will connect the two boxes

Page 11: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

Exercise 2: Building a Simple Workflow

Click on the output box, drag towards “Get protein fasta”, and let go. An arrow will connect the two boxes

You have now built your first workflow!

It should look something like this

Page 12: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

Exercise 2: Building a Simple Workflow

Run the workflow by selecting “file -> run workflow”, or by clicking on the play button at the top of the workbench

Page 13: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

Exercise 2: Building a Simple Workflow

An input window will appear. As you can see, we have not yet added a description of the workflow or of the input

Click on ‘Set Value’ in the input window and add a Uniprot protein identifier (e.g. P15409) where it says “some input data goes here”

Page 14: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

Exercise 2: Building a Simple Workflow

Click “run workflow” In the bottom left of the results window, click on the results.

You will now see a protein sequence from Uniprot

Now we will find out what functional motifs the protein contains, but first we have to tell Taverna about some new services

Page 15: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

Exercise 2: Adding New Services

Go to the services panel in Taverna and click “import new services”. For each type of service, you are given the option to add a new service

Select ‘Soaplab service…’ A window will pop-up asking for a web address

Page 16: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

Exercise 2: Adding New Services

Enter the address for the Soaplab services- it is at http://wsembnet.vital-it.ch/soaplab2/services

Scroll down the Services list and look at the new Soaplab services that are now included.

Page 17: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

Exercise 2: Building a Simple Workflow

In the services panel, search for pscan – it should be in the Soaplab services you just added

Drag and drop this service onto the workflow diagram

Page 18: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

Exercise 3: Adding more Services

We can connect the two services together in the same way as before

At the top of the workflow diagram panel, change the view to show all ports by clicking on the icon shown below

This view allows you to see any data input/output or parameter value options for your chosen service

Show all ports icon

Page 19: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

Exercise 3: Adding more Services

As you can see, pscan has a lot more ports. Most of the time, you don’t need to connect all ports. Some are optional and some already have default values set. Service documentation should tell you this. You can use the BioCatalogue to find documentation and user descriptions

Change the orientation of the port names to fit them on the screen more easily by clicking on the icon shown below

change orientation

Page 20: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

Exercise 3: Adding more Services

Connect ‘output_text’ from the ‘Get_protein_Fasta’ service to the ‘sequence_direct_data’ input of pscan

Also, create a new workflow output called pscanOut and connect it to ‘pscan -> outfile’

Page 21: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

3: Adding a Workflow Description

Right-click on a blank part of the workflow diagram and select “Annotate”

Add some details about the workflow e.g. who is the author, what does it do

You can also add examples and descriptions for the workflow inputs by selecting them and selecting “Annotate”

Add an example for the protein ID (e.g. P15409) Save the workflow by going to “File -> save workflow” Run the workflow again and look at the results

Page 22: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

4: Using REST Services

The services we have used up until now have been Soaplab services, but Taverna can also run WSDL and RESTful services

Go to the Service Catalogue tab of Taverna and search for dbfetch

From the REST Service results, select GET /dbfetch/{db}/{id}

Right-click on the service and select “Add to Service Panel”

Page 23: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

4: Using REST Services

Searching the service catalogue

Page 24: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

4: Using REST Services

In the services search panel in Taverna, search for dbfetch

Right-click on the service and choose “Add to workflow with name…”

Enter a name such as “dbfetch” and click OK

Page 25: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

4: Using REST Services

As you can see, the items from the dbfetch template become inputs in Taverna.

Page 26: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

4: Using REST Services

You can also enter the template directly Right-click on an empty area of the workflow and select

“REST” from the “Insert” section Enter the template and click OK

Page 27: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

4: Using REST Services

For this service, we need to supply a database name and a protein ID.

Connect the protein ID input to the REST service ID input port

Right-click on the ‘db’ input port on the REST service and select ‘constant value’.

Add the constant value ‘uniprotkb’ and click “OK” Add a workflow output port and connect it to the REST

‘response body’ output port Your workflow should look something like the one on the next

slide Save and run your workflow Now your results will include the uniprot entry for your protein

Page 28: An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester

4: Using REST Services