taverna workflows in cagrid

13
Taverna workflows in caGrid caGrid Architecture Face-to-face meeting Stian Soiland-Reyes & Aleksandra Nenadic, myGrid University of Manchester, UK Boston, 2009-05-11 http://www.mygrid.org.uk/dev/wiki/display/caGrid

Upload: ebony

Post on 03-Feb-2016

44 views

Category:

Documents


0 download

DESCRIPTION

Taverna workflows in caGrid. caGrid Architecture Face-to-face meeting. Stian Soiland-Reyes & Aleksandra Nenadic, myGrid University of Manchester, UK Boston, 2009-05-11. http://www.mygrid.org.uk/dev/wiki/display/caGrid. Agenda. What is a Taverna Workflow? Abstract caGrid workflow example - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Taverna workflows in caGrid

Taverna workflowsin caGrid

caGrid Architecture Face-to-face meeting

Stian Soiland-Reyes & Aleksandra Nenadic, myGridUniversity of Manchester, UK

Boston, 2009-05-11http://www.mygrid.org.uk/dev/wiki/display/caGrid

Page 2: Taverna workflows in caGrid

Agenda– What is a Taverna Workflow?– Abstract caGrid workflow example– Actual Taverna workflow– caGrid plugin for Taverna– Current work– Where do we go next?

Page 3: Taverna workflows in caGrid

What is a Taverna workflow?• Set of services (web services, RESTful, local scripts, other

workflows, etc)

• Set of data links between services - “put output X from service A as input Y to service B”– If needed: List handling, control links

• This can be called a data-oriented workflows (dataflow)– Say where you want the data to flow instead of what

you want to do

– Compare with more procedural workflow languages like BPEL

• Beneficial way of thinking for much data-driven scientific research

Page 4: Taverna workflows in caGrid

Abstract caGrid workflow

• Use (parts of) result to query GridPIR and caBIO data services for matching sequences

• Query the CPAS data service to find protein sequence

Page 5: Taverna workflows in caGrid

Actual Taverna workflow• Looks very similar

to abstract workflow

• Introduces shim services to build and parse data elementsOrange: Local scripts

to parse the description string and

build CQL queries

Purple: Build/parse complex type for web service input/output

Blue: Constant CQL query

Green: caGrid WSDL services

http://www.myexperiment.org/workflows/752

Page 6: Taverna workflows in caGrid

caGrid plugin for Taverna (1)• Discover/browse

services registered in the caGrid Index Service

• Easy to install into Taverna:

• Listing all services:

Page 7: Taverna workflows in caGrid

caGrid plugin for Taverna (2)

• …or by semantic search:

Page 8: Taverna workflows in caGrid

Current work by myGrid & caGrid• Develop Taverna support for GAARDS-secured caGrid

services• Wrap existing 3rd party services (that are used by

existing Taverna users) for caGrid and annotate them to match Silver-level compatibility guidelines

• Taverna workflow as a caGrid service• Service discovery improvements• Documentation, building example workflows

Page 9: Taverna workflows in caGrid

Real example: Lymphoma type prediction• Scientific value

– Using gene-expression patterns associated with DLBCL and FL to predict the lymphoma type of an unknown sample.

– Using SVM (Support Vector Machine) to classify data, and predicting the tumor types of unknown examples.

• Main steps– Query training data from experiments

stored in caArray– Preprocess (normalize) the microarray data.– Add training and testing data into SVM

service to get classification results

*Fig. from MA Shipp. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature medicine, 2002

Page 10: Taverna workflows in caGrid

Lymphoma type prediction workflow

Classify & predict

Query

Preprocess

Wei Tanhttp://www.myexperiment.org/workflows/746

Page 11: Taverna workflows in caGrid

Lymphoma type prediction results

The (few) classification errors are highlightedAcknowledgements:

Juli Klemm, Xiaopeng Bian, Rashmi Srinivasa (NCI)Jared Nedzel (MIT), Wei Tan

Page 12: Taverna workflows in caGrid

Where do we go next?

• Just some ideas..– Tighter integration with caDSR– Partial rerun of workflows– Improve Taverna’s support for complex XML types– Workflow sharing– Workflows in caGrid portal– Guided workflow building using caGrid metadata– Easily build CQL queries from Taverna

• Google Summer of Code 2009

Page 13: Taverna workflows in caGrid

Any questions..?