building and running cagrid workflows in taverna 1 computation institute, university of chicago and...

1
Building and Running caGrid Workflows in Taverna 1 Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL, USA 2 Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA 3 School of Computer Science, University of Manchester, Manchester, UK OVERVIEW For the empowerment of users from biological or biomedical domains in creating and executing their workflows efficiently, the caGrid Workflow team, with the ICR working group, has selected the Taverna workbench and successfully created a tool suite to orchestrate caGrid Data and Analytical services for ICR workflows. This tool suite aims at providing an easy-to-use workflow authoring and submission tool that will be capable of integrating caGrid services as well as third- party services in scientific workflows. We also helped caGrid community to build several workflows that have real scientific value, and we commit ourselves to support caBIG users across workspaces in creating and executing their domain based workflows. Web Resources: Taverna: http:// taverna.sourceforge.net / caGrid Plug-in download: http://www.mcs.anl.gov/~wtan/t2/ caBIG: http:// www.cagrid.org/mwiki/index.php?title =CaGrid CaGrid Workflow Quick Start Guide: http://www.cagrid.org/display/workflow/Taverna+Quickstart+Guide End-to-End Solution for caGrid Workflow Search caGrid Index Service for registered caGrid services matching various search criteria: Service name, inputs, outputs, research center, class names, concept codes, etc. Application: Lymphoma Prediction Workflow *,[1] Scientific value Use gene-expression patterns associated with Diffuse large B-cell Lymphoma (DLBCL) and Follicular Lymphoma (FL) to predict the lymphoma type of an unknown sample. Use GenePattern services SVM and KNN to build the tumor classification model and predict the tumor types of unknown examples. Major steps Extract Microarray. Querying training data and unknown sample from experiments stored in caArray. Preprocess Microarray. Preprocessing, or normalize the microarray data for later processing. Predict Lymphoma type. Predicting lymphoma type using SVM & KNN services. Extension Generalized the lymphoma prediction workflow into a cancer type prediction workflow. Applied it on Experiment 236 in caArray database.[2] caG rid C ancerD ata S tandards Repository D iscovery C om p osition E xecu tio n Reuse Com m unity reuse generate Service discovery based on caD SR . D ata-flow m odeling flavor caG rid activity State m anagem ent ( W SRF) caG rid security Im plicititeration: handle parallelexecution W SR F and security enforcem ent W orkflow as a service A Facebook forw orkflow s caG rid C ancerD ata S tandards Repository C ancerD ata S tandards Repository D iscovery C om p osition E xecu tio n Reuse Com m unity reuse generate Service discovery based on caD SR . D ata-flow m odeling flavor caG rid activity State m anagem ent ( W SRF) caG rid security Im plicititeration: handle parallelexecution W SR F and security enforcem ent W orkflow as a service A Facebook forw orkflow s [1] [1] MA Shipp, et al. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature medicine, 2002(8) [2] S. Ramaswamy, et al. Multiclass cancer diagnosis using tumor gene expression signatures. PNAS, vol. 98, p. 15149, 2001. *Acknowledgement: Juli Klemm, Xiaopeng Bian, Rashmi Srinivasa (NCI); Jared Nedzel (MIT) Log onto a given Grid, configure service’s security properties with caGrid credential. Lymphoma prediction workflow 1. Extract Microarray 2. Preprocess Microarray 3. Predict Lymphoma Type Semantic search WSRF Support Invoke stateful Grid services caGrid Security Support Available caGrid Workflows caDSR data query Protein sequence query Microarray clustering Lymphoma prediction Cancer classification caGrid workflows at myExperiment http://www.myexperiment. org/workflows/search? query=cabig “Facebook” for caGrid workflows Result of the lymphoma prediction workflow Result of the cancer type prediction over caArray Experiment 236

Upload: baldric-may

Post on 02-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Building and Running caGrid Workflows in Taverna 1 Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL, USA 2 Mathematics

Building and Running caGrid Workflows in Taverna

1Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL, USA 2Mathematics and Computer Science Division, Argonne National Laboratory, Argonne, IL, USA

3 School of Computer Science, University of Manchester, Manchester, UK

OVERVIEWFor the empowerment of users from biological or biomedical domains in creating and executing their workflows efficiently, the caGrid Workflow team, with the ICR working group, has selected the Taverna workbench and successfully created a tool suite to orchestrate caGrid Data and Analytical services for ICR workflows. This tool suite aims at providing an easy-to-use workflow authoring and submission tool that will be capable of integrating caGrid services as well as third-party services in scientific workflows. We also helped caGrid community to build several workflows that have real scientific value, and we commit ourselves to support caBIG users across workspaces in creating and executing their domain based workflows.

Web Resources:Taverna: http://taverna.sourceforge.net/

caGrid Plug-in download: http://www.mcs.anl.gov/~wtan/t2/

caBIG: http://www.cagrid.org/mwiki/index.php?title=CaGrid

CaGrid Workflow Quick Start Guide: http://www.cagrid.org/display/workflow/Taverna+Quickstart+Guide

End-to-End Solution for caGrid Workflow

Search caGrid Index Service for registered caGrid services matching various search

criteria:Service name, inputs, outputs, research center, class names, concept codes, etc.

Application: Lymphoma Prediction Workflow*,[1]

• Scientific value

• Use gene-expression patterns associated with Diffuse large B-cell Lymphoma (DLBCL) and Follicular Lymphoma (FL) to predict the lymphoma type of an unknown sample.

• Use GenePattern services SVM and KNN to build the tumor classification model and predict the tumor types of unknown examples.

• Major steps

• Extract Microarray. Querying training data and unknown sample from experiments stored in caArray.

• Preprocess Microarray. Preprocessing, or normalize the microarray data for later processing.

• Predict Lymphoma type. Predicting lymphoma type using SVM & KNN services.

• Extension

• Generalized the lymphoma prediction workflow into a cancer type prediction workflow.

• Applied it on Experiment 236 in caArray database.[2]

caGrid

Cancer Data Standards Repository

DiscoveryComposition

Execution

Reuse

Community

reuse

genera

te

Service discovery based on caDSR.

Data-flow modeling flavor caGrid activity

State management (WSRF)caGrid security

Implicit iteration: handle parallel executionWSRF and security enforcement

Workflow as a service

A Facebook for workflowscaGrid

Cancer Data Standards Repository

Cancer Data Standards Repository

DiscoveryComposition

Execution

Reuse

Community

reuse

genera

te

Service discovery based on caDSR.

Data-flow modeling flavor caGrid activity

State management (WSRF)caGrid security

Implicit iteration: handle parallel executionWSRF and security enforcement

Workflow as a service

A Facebook for workflows

[1]

[1] MA Shipp, et al. Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning. Nature medicine, 2002(8)[2] S. Ramaswamy, et al. Multiclass cancer diagnosis using tumor gene expression signatures. PNAS, vol. 98, p. 15149, 2001.

*Acknowledgement: Juli Klemm, Xiaopeng Bian, Rashmi Srinivasa (NCI); Jared Nedzel (MIT)

Log onto a given Grid, configure service’s security properties with caGrid credential.

Lymphoma prediction workflow1. Extract Microarray2. Preprocess Microarray3. Predict Lymphoma Type

Semantic search WSRF Support

Invoke stateful Grid services

caGrid Security Support

Available caGrid WorkflowscaDSR data queryProtein sequence queryMicroarray clusteringLymphoma predictionCancer classificationcaGrid workflows at myExperiment http://www.myexperiment.org/workflows/search?query=cabig

“Facebook” for caGrid workflows

Result of the lymphoma prediction workflow

Result of the cancer type prediction over caArray

Experiment 236