taverna workbench stuart owen university of mancester, uk [email protected]

24
Taverna Workbench Stuart Owen University of Mancester, UK [email protected]

Upload: charity-haynes

Post on 14-Jan-2016

216 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Taverna Workbench Stuart Owen University of Mancester, UK stuart.owen@manchester.ac.uk

Taverna Workbench

Stuart OwenUniversity of Mancester, UK

[email protected]

Page 2: Taverna Workbench Stuart Owen University of Mancester, UK stuart.owen@manchester.ac.uk

What is a workflow

• Data workflows– A task is invoked once its expected

data has been received, and when complete passes any resulting data downstream.

– B starts when it receives data from A.– C and D run in parallel when they

receive data from B– E starts once its received data from

both C and D.

• Control workflows– A task is invoked once its dependant

tasks have completed.– B starts when A has completed.– C and D run in parallel once B has

completed– E starts once both C and D have

completed.

A

B

C D

E

F

Page 3: Taverna Workbench Stuart Owen University of Mancester, UK stuart.owen@manchester.ac.uk

Advantages of workflows

12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa

Page 4: Taverna Workbench Stuart Owen University of Mancester, UK stuart.owen@manchester.ac.uk

Advantages to workflows

• High-level abstraction– Easier to understand and modify.– Easier to describe and discuss

with others.– Describes what you want to do,

not how to do it.

• Automation

• Sytematic

• Sharing and re-use– Either on its own, or within other

workflows!

Page 5: Taverna Workbench Stuart Owen University of Mancester, UK stuart.owen@manchester.ac.uk

Workflows within Taverna

• Predominantly based around the flow of data, but does allow control constraints as well.

• Service oriented workflows. Services may or not be grid enabled.

• High-level GUI approach seperated from lower level coding, you don’t have to be a coder to build a workflow.

• Enactment can take place separate to the GUI, allowing workflows to be executed from the command line or within other systems.

Page 6: Taverna Workbench Stuart Owen University of Mancester, UK stuart.owen@manchester.ac.uk
Page 7: Taverna Workbench Stuart Owen University of Mancester, UK stuart.owen@manchester.ac.uk

Taverna 1.4 Workbench

• Integral part of the myGrid project

• Java based, runs on Windows, Mac OS, Linux, Solaris

• Open source and user driven development

• Taverna in OMII-UK– Dedicated team of developers focused on design,

implementation, testing and support – leading to production quality software.

– Development of Taverna 2.0

Page 8: Taverna Workbench Stuart Owen University of Mancester, UK stuart.owen@manchester.ac.uk

Taverna 1.4 workbench

Page 9: Taverna Workbench Stuart Owen University of Mancester, UK stuart.owen@manchester.ac.uk

Freefluo Workflow enactor

Scufl + Workflow Object Model

Processor Processor

WebService

Soap

lab

Processor

LocalApp

Processor

Enactor

TavernaWorkbench

Processor

BioMOBY

Processor

?

SCUFL

Application data flow layerScufl graph + service introspection

Execution flow layer List management; implicit iteration mechanism; MIME & semantic type decoration; fault management; service alternates

Processor invocation layer

Workflow Execution

(Simple Conceptual Unified Flow Language)

Page 10: Taverna Workbench Stuart Owen University of Mancester, UK stuart.owen@manchester.ac.uk

Nested workflows

• A processor can be a workflow itself.

• Encourages the reuse of workflows within a more complex scenario.

• Greater abstraction of an overall process making it more manageable.

Page 11: Taverna Workbench Stuart Owen University of Mancester, UK stuart.owen@manchester.ac.uk
Page 12: Taverna Workbench Stuart Owen University of Mancester, UK stuart.owen@manchester.ac.uk

Iterations

• Scufl handles iterations implicitly• i.e. Taverna handles it automagically, theres no need for the user to

indicate that there is an iteration required.• Taverna recognises the data mismatch and repeatedly runs the task

over each data element in the list.

• Iteration stategy with multiple inputs can be configured.

•“Cross product” - all against all

•“Dot product” – first against first, second against second ….. etc

Page 13: Taverna Workbench Stuart Owen University of Mancester, UK stuart.owen@manchester.ac.uk

What about when a service fails?

• Most services are owned by other people• No control over service failure• Some are research level• Workflows are only as good as the services they

connect!• To help - Taverna can:• Notify failures• Instigate retries• Set criticality• Substitute alternative • services

Page 14: Taverna Workbench Stuart Owen University of Mancester, UK stuart.owen@manchester.ac.uk

Provenance Data?

• Supports scientific method and best practice

• Metadata about the origin of a resource (workflow , service, data , experiment hypothesis etc) and the process of how a resource was generated.

• The Who? , What? , When? ,Where? and Why? about resources.

• Stored as RDF triples

• Also available as OWL, opening it up to complex reasoning

Provenance Record

Result Result Result Result Result

Input

Page 15: Taverna Workbench Stuart Owen University of Mancester, UK stuart.owen@manchester.ac.uk

Typed Workflow Run

urn:lsid:..:wfInstance:8

runs

launchedBy

Experimenter

belongsTo

Organization

urn:lsid:…:org:HY7

ProcessRunWorkflowRun Workflow

Provenance Ontology

runs

launchedBy

belongsTo

executed

urn:lsid:…:person:4

urn:lsid:…:workflow:6

urn:lsid:…:processRun:84

urn:lsid:…:processRun:51

executed

executed

Page 16: Taverna Workbench Stuart Owen University of Mancester, UK stuart.owen@manchester.ac.uk

Provenance Browser

Page 17: Taverna Workbench Stuart Owen University of Mancester, UK stuart.owen@manchester.ac.uk

New plans for Taverna 2.0

Page 18: Taverna Workbench Stuart Owen University of Mancester, UK stuart.owen@manchester.ac.uk

Evolving challenges

• Long running data intensive workflows

• Manipulation of confidential or otherwise protected information

• Use with classical grid systems

• Publishing and sharing of workflows

• Better use of provenance

Page 19: Taverna Workbench Stuart Owen University of Mancester, UK stuart.owen@manchester.ac.uk

Runtime Service Binding

• Service definition consists of an abstract description

• Resolved at workflow runtime to one or more concrete resources by a broker

• Allows load balancing or economic model based service selection over grid environments

Page 20: Taverna Workbench Stuart Owen University of Mancester, UK stuart.owen@manchester.ac.uk

Processor Dispatch Stack

Page 21: Taverna Workbench Stuart Owen University of Mancester, UK stuart.owen@manchester.ac.uk

3rd party data transfers

• Allows ‘in place’ referencing of data – Large data sets no longer round-trip between workflow engine and

data provider– Allows restricted access to sensitive data

• Automatic de-reference when a reference type is linked to a value type within a workflow.

Page 22: Taverna Workbench Stuart Owen University of Mancester, UK stuart.owen@manchester.ac.uk

Streaming Data

• Allow execution of downstream workflow stages on partially complete results from upstream.

Service 1 Service 2 Service 3

Non streaming (Taverna 1), entire iteration must complete at each stage

Streamed data, Service 2 starts operating on partial results from Service 1

Page 23: Taverna Workbench Stuart Owen University of Mancester, UK stuart.owen@manchester.ac.uk

Conclusions

• Taverna and its source code is free to download.– http://taverna.sourceforge.net

• Taverna is being adopted by a number of different disciplines outside its bio-science origins, including chemoinformatics, social science, astronomy.

• Open architecture and support for plugins to cope with open world – allows expansion into other areas

• User driven development– Taverna users mailing list– Taverna hackers mailing list

• Production quality software within OMII-UK

Page 24: Taverna Workbench Stuart Owen University of Mancester, UK stuart.owen@manchester.ac.uk

Acknowledgements

• The myGrid group, past and present.• OMII-UK• All our users

• Carole Goble• Katy Wolstencroft• Daniele Turi• Matthew Gamble• Tom Oinn• Paul Fisher