taverna workbench stuart owen university of mancester, uk stuart.owen@manchester.ac.uk

Post on 14-Jan-2016

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Taverna Workbench

Stuart OwenUniversity of Mancester, UK

stuart.owen@manchester.ac.uk

What is a workflow

• Data workflows– A task is invoked once its expected

data has been received, and when complete passes any resulting data downstream.

– B starts when it receives data from A.– C and D run in parallel when they

receive data from B– E starts once its received data from

both C and D.

• Control workflows– A task is invoked once its dependant

tasks have completed.– B starts when A has completed.– C and D run in parallel once B has

completed– E starts once both C and D have

completed.

A

B

C D

E

F

Advantages of workflows

12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa

Advantages to workflows

• High-level abstraction– Easier to understand and modify.– Easier to describe and discuss

with others.– Describes what you want to do,

not how to do it.

• Automation

• Sytematic

• Sharing and re-use– Either on its own, or within other

workflows!

Workflows within Taverna

• Predominantly based around the flow of data, but does allow control constraints as well.

• Service oriented workflows. Services may or not be grid enabled.

• High-level GUI approach seperated from lower level coding, you don’t have to be a coder to build a workflow.

• Enactment can take place separate to the GUI, allowing workflows to be executed from the command line or within other systems.

Taverna 1.4 Workbench

• Integral part of the myGrid project

• Java based, runs on Windows, Mac OS, Linux, Solaris

• Open source and user driven development

• Taverna in OMII-UK– Dedicated team of developers focused on design,

implementation, testing and support – leading to production quality software.

– Development of Taverna 2.0

Taverna 1.4 workbench

Freefluo Workflow enactor

Scufl + Workflow Object Model

Processor Processor

WebService

Soap

lab

Processor

LocalApp

Processor

Enactor

TavernaWorkbench

Processor

BioMOBY

Processor

?

SCUFL

Application data flow layerScufl graph + service introspection

Execution flow layer List management; implicit iteration mechanism; MIME & semantic type decoration; fault management; service alternates

Processor invocation layer

Workflow Execution

(Simple Conceptual Unified Flow Language)

Nested workflows

• A processor can be a workflow itself.

• Encourages the reuse of workflows within a more complex scenario.

• Greater abstraction of an overall process making it more manageable.

Iterations

• Scufl handles iterations implicitly• i.e. Taverna handles it automagically, theres no need for the user to

indicate that there is an iteration required.• Taverna recognises the data mismatch and repeatedly runs the task

over each data element in the list.

• Iteration stategy with multiple inputs can be configured.

•“Cross product” - all against all

•“Dot product” – first against first, second against second ….. etc

What about when a service fails?

• Most services are owned by other people• No control over service failure• Some are research level• Workflows are only as good as the services they

connect!• To help - Taverna can:• Notify failures• Instigate retries• Set criticality• Substitute alternative • services

Provenance Data?

• Supports scientific method and best practice

• Metadata about the origin of a resource (workflow , service, data , experiment hypothesis etc) and the process of how a resource was generated.

• The Who? , What? , When? ,Where? and Why? about resources.

• Stored as RDF triples

• Also available as OWL, opening it up to complex reasoning

Provenance Record

Result Result Result Result Result

Input

Typed Workflow Run

urn:lsid:..:wfInstance:8

runs

launchedBy

Experimenter

belongsTo

Organization

urn:lsid:…:org:HY7

ProcessRunWorkflowRun Workflow

Provenance Ontology

runs

launchedBy

belongsTo

executed

urn:lsid:…:person:4

urn:lsid:…:workflow:6

urn:lsid:…:processRun:84

urn:lsid:…:processRun:51

executed

executed

Provenance Browser

New plans for Taverna 2.0

Evolving challenges

• Long running data intensive workflows

• Manipulation of confidential or otherwise protected information

• Use with classical grid systems

• Publishing and sharing of workflows

• Better use of provenance

Runtime Service Binding

• Service definition consists of an abstract description

• Resolved at workflow runtime to one or more concrete resources by a broker

• Allows load balancing or economic model based service selection over grid environments

Processor Dispatch Stack

3rd party data transfers

• Allows ‘in place’ referencing of data – Large data sets no longer round-trip between workflow engine and

data provider– Allows restricted access to sensitive data

• Automatic de-reference when a reference type is linked to a value type within a workflow.

Streaming Data

• Allow execution of downstream workflow stages on partially complete results from upstream.

Service 1 Service 2 Service 3

Non streaming (Taverna 1), entire iteration must complete at each stage

Streamed data, Service 2 starts operating on partial results from Service 1

Conclusions

• Taverna and its source code is free to download.– http://taverna.sourceforge.net

• Taverna is being adopted by a number of different disciplines outside its bio-science origins, including chemoinformatics, social science, astronomy.

• Open architecture and support for plugins to cope with open world – allows expansion into other areas

• User driven development– Taverna users mailing list– Taverna hackers mailing list

• Production quality software within OMII-UK

Acknowledgements

• The myGrid group, past and present.• OMII-UK• All our users

• Carole Goble• Katy Wolstencroft• Daniele Turi• Matthew Gamble• Tom Oinn• Paul Fisher

top related