building scientific workflows with taverna and bpel: a comparative study in cagrid

23
Building Scientific Workflows with Taverna and BPEL: a Comparative Study in caGrid Wei Tan 1 , Paolo Missier 2 , Ravi Madduri 1 , Ian Foster 1 1 University of Chicago and Argonne National Laboratory, USA 2 School of Computer Science, University of Manchester, Manchester, U.K [email protected] http://www-fp.mcs.anl.gov/~foster/

Upload: diata

Post on 19-Mar-2016

49 views

Category:

Documents


0 download

DESCRIPTION

Building Scientific Workflows with Taverna and BPEL: a Comparative Study in caGrid. Wei Tan 1 , Paolo Missier 2 , Ravi Madduri 1 , Ian Foster 1. [email protected] http://www-fp.mcs.anl.gov/~foster/. 1 University of Chicago and Argonne National Laboratory, USA - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Building Scientific Workflows with Taverna and BPEL:  a Comparative Study in caGrid

Building Scientific Workflows with Taverna and BPEL:

a Comparative Study in caGrid

Wei Tan1, Paolo Missier2, Ravi Madduri1,Ian Foster1

1 University of Chicago and Argonne National Laboratory, USA

2 School of Computer Science, University of Manchester, Manchester, U.K

[email protected]

http://www-fp.mcs.anl.gov/~foster/

Page 2: Building Scientific Workflows with Taverna and BPEL:  a Comparative Study in caGrid

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL

2

Agenda

• Introduction to caGrid• Why scientific workflows in caGrid?• BPEL and Taverna comparison

- Service discovery

- Service composition & workflow execution

- Data-driven vs. control-driven modeling

- Implicit vs. explicit definition of data

- Implicit vs. explicit iteration on data

- Workflow result analysis

• Conclusion

Page 4: Building Scientific Workflows with Taverna and BPEL:  a Comparative Study in caGrid

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL

As of Oct19, 2008:

122 participants

105 services

70 data

35 analytical

Page 5: Building Scientific Workflows with Taverna and BPEL:  a Comparative Study in caGrid

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL

5caGrid

data

instruments

computation resource

Virtualization

Security

Connectivity

Introduction: caGrid and workflow

Cancer Data Standards Repository

Discovery Composition

Execution

Analysis

Community

Scientific workflow lifecycle

reuse

generate

Page 6: Building Scientific Workflows with Taverna and BPEL:  a Comparative Study in caGrid

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL

Challenges faced by caGrid users

66caGrid

Cancer Data Standards Repository

Discovery Composition

Execution Analysis

Community

reuse

generate

Locating needed servicesDetermining function

Accessing services from a workflow GUI for building workflows easily

Executing workflow efficiently

Persisting and visualizing results

Sharing and reusing workflows

Page 7: Building Scientific Workflows with Taverna and BPEL:  a Comparative Study in caGrid

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL

Our goals in this paper

• Communicate practical experiences based on our work in the caGrid project

• Cover the entire scientific workflow lifecycle, from service discovery to service composition, workflow execution, and workflow result analysis

Based on caGrid requirements for workflow language and tooling

Also applicable to other areas in data-intensive and exploratory science?

7

Page 8: Building Scientific Workflows with Taverna and BPEL:  a Comparative Study in caGrid

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL

BPEL and Taverna

• Not the only two but they are representative choices• BPEL

- XML-based specification for web service based process behavior

- Industry standard adopted by IBM, SAP, Oracle, etc.- Has also attracted attention from the scientific

community because of its support for SOA paradigm• Taverna

- Open-source, from the myGrid consortium in UK- Design and execution of scientific workflows- Plug-in architecture for extension (access more

applications, visualize more data types, etc.)8

Page 9: Building Scientific Workflows with Taverna and BPEL:  a Comparative Study in caGrid

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL

Querying semantic data in cancer research• Identify description logic

concepts relating to a particular context, e.g., “caCore” 1) Query all projects related to

context “caCore” 2) find UML classes in each

project3) use project and UML class

information to query the semantic metadata

4) retrieve the concept code• We adopt this query as a use

case to guide our comparison9

Project

Class

context

Class metadata

Description logic concept

caCore1

2

3

4

Page 10: Building Scientific Workflows with Taverna and BPEL:  a Comparative Study in caGrid

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL

Support for service discovery

• Before building a workflow- Need to find appropriate services to be composed- Service endpoints are not naturally known to users- Exact semantics of those services are not known

Taverna offers- A extensible scavenger interface for arbitrary service discovery

according to users needs (see next page)- A native semantic discovery facility called Feta: myGrid ontology based

service annotation and search.

BPEL offers- UDDI which is not widely adopted- Research efforts like: WSMO, OWL-S, which are more on specification

level- No open-source tool is available that works with a service query

component in an integrated way

10

Page 11: Building Scientific Workflows with Taverna and BPEL:  a Comparative Study in caGrid

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL

11

Solution for caGrid: Metadata-based service query

caGrid service metadata

caGrid scavenger: query the CaDSR Service in the use case

• Types of query- String based- Property based- Semantic based

Page 12: Building Scientific Workflows with Taverna and BPEL:  a Comparative Study in caGrid

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL

Page 13: Building Scientific Workflows with Taverna and BPEL:  a Comparative Study in caGrid

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL

Service composition & workflow execution

• Data-driven vs. control-driven modeling

• Implicit vs. explicit definition of data

• Implicit vs. explicit iteration on data

13

Page 14: Building Scientific Workflows with Taverna and BPEL:  a Comparative Study in caGrid

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL

Data-driven vs. control-driven modeling

14

BPEL Taverna (Scufl)Activities in

modelBasic and structure

activitiesProcessors as data processing

units with in/output portsSemantics

of linksTransfer of control Transfer of data

Data definition

Explicitly defined (global variables)

Implicitly defined (processor’s input/output)

Data initializatio

n

Complex data type mustbe explicitly initialized

Automatically

Control logic

Full-fledged: sequence, conditional, parallel, event-

triggered, etc

Limited: sequential, parallel and conditional

Parallel execution

Defined in <flow> or <ForEach>

By default

Comparison of BPEL and Taverna (Scufl) w.r.t. control/data-flow

Page 15: Building Scientific Workflows with Taverna and BPEL:  a Comparative Study in caGrid

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL

Implicit vs. explicit definition of data

• Taverna- Processors have input/output ports with an

associated data type- Data travels from the output port of a processor to the

input of one or more downstream processors- Interaction among processors is defined entirely by

the arcs in the dataflow graph

• BPEL- Requires the explicit definition of variables, and

explicit initiation for complex types- Data are shared amongst activities (i.e., are global)- More complexity, but more power and flexibility in

data handling 15

Page 16: Building Scientific Workflows with Taverna and BPEL:  a Comparative Study in caGrid

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL

Implicit vs. explicit iteration on data

• Implicit iteration in Taverna- Occurs when an input port receives a list element:

- E.g., a processor that outputs a “list of strings,” can legally be connected to a processor with an input port of type “string.”

- Taverna interprets this type mismatch as an indication that the destination processor must be invoked repeatedly, once for each element of the input list

- This behavior is defined with Taverna's functional programming model

• Explicit iteration in BPEL- BPEL does not allow type mismatch and iterate needs to be

defined explicitly- Again, BPEL offers more flexibility to define more advanced

iteration patterns (with more complexity in the model, though)

16

Page 17: Building Scientific Workflows with Taverna and BPEL:  a Comparative Study in caGrid

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL

Implicit vs. explicit iteration in CaDSR

17

findProjects returns an array Project []

findClassesInProject receives type Project and finds all UML classes in this (single) project

In Taverna an xmlsplitter extracts the project array and feeds this directly into findClassesInProject

In BPEL a ForEach construct is needed for the iteration over array Project []

Page 18: Building Scientific Workflows with Taverna and BPEL:  a Comparative Study in caGrid

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL

Workflow result analysis

• Workflow provides a natural framework for data tracking and analysis- In both Taverna and BPEL

• Taverna: offers native provenance support- More precise linkage annotation between services’

input and output- Semantic support- Not the focus of our project, see ref. [16] [17] for more

details

18

Page 19: Building Scientific Workflows with Taverna and BPEL:  a Comparative Study in caGrid

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL

19

Conclusion: Taverna offers lifecycle support

caGrid

Cancer Data Standards Repository

Discovery composition

Execution Analysis

Community

reuse

generate

Scavenger: for customized service discovery Feta: service annotation and discovery.

Scufl: compact modeling of data flow Built-in processors: Soaplab, BioMart, etc. Customized processors as plug-ins

Implicit iteration: handle parallel execution

Result persistence and visualization

A community for sharing workflows

Provides a compact set of primitives that eases the modeling of data flowsAllows users to specify “what to do” instead of “how to do it”

Page 20: Building Scientific Workflows with Taverna and BPEL:  a Comparative Study in caGrid

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL

20

Conclusion: BPEL offers unique features • Build-time

- A comprehensive set of primitives to model processes of all flavors- control-flow oriented- data-flow oriented (although a little verbose)- event driven, etc.

- Full featured- process logic, data manipulation, event and message

processing, fault handling, etc.• Run-time

- BPEL engines typically run inside application servers with- persistent state storage- reliability and scalability guarantees

- Important for long-running and computation-intensive workflows- For now Taverna engine does not provide these capabilities

Page 21: Building Scientific Workflows with Taverna and BPEL:  a Comparative Study in caGrid

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL

21

Conclusion

• Factors in deciding which language/tool to choose- User IT expertise

- some prefer scripting language, others a friendly GUI- Problem size

- Taverna often runs on desktop and handles problem of moderate size (currently common in bioinformatics)

- Grid/server based systems like Swift can deal with huge volume of data and intensive computation (for example, applications in medical informatics, neuroscience, physics)

- Applications involved- Web services, batch jobs, shell scripts, etc.

• Future work- Enrich the caGrid workflow tool set based on Taverna- Build more real workflows to help scientific investigation- Address issues of scale as they arise

Page 22: Building Scientific Workflows with Taverna and BPEL:  a Comparative Study in caGrid

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL

22

Thank you for your Thank you for your attentionattention

Page 23: Building Scientific Workflows with Taverna and BPEL:  a Comparative Study in caGrid

W. Tan, et al. Building Scientific Workflow with Taverna and BPEL

23

Introduction: caGrid and workflow

caGrid

data

instruments

computation resource

Virtualization Security

Connectivity Cancer Data Standards Repository