taverna in 2006 industry workshop, [email protected]@ebi.ac.uk, 8 th march 2006
TRANSCRIPT
Taverna in 2006Taverna in 2006
Industry Workshop,
8th March 2006
Taverna 1Taverna 1
3 Years old, 1300 downloads in latest release over two months.
Expanding community covering an increasing variety of domains
Originally funded as part of an EPSRC pilot project, research rather than production focus
A success but with limitations
Taverna 1.3.1 WorkbenchTaverna 1.3.1 Workbench
Evolving challengesEvolving challenges
Long running data intensive workflows Manipulation of confidential or otherwise protected
information Use with classical grid systems Interaction with users during workflows Workflow authoring, service discovery and
composition Data comprehension, provenance and
visualization
User Interaction HandlingUser Interaction Handling
Interaction Service and corresponding Taverna processor allows a workflow to call out to an expert human user
Used to embed the Artemis annotation editor within an otherwise automated genome annotation pipeline
Interaction Service ArchitectureInteraction Service Architecture
Patterns
Submit
Status
Results
Upload
Download
InteractionStore Proxy
PatternPattern
Pattern
Taverna 1.3
DALEC – Linking Taverna and DASDALEC – Linking Taverna and DAS
DALEC exposes a Taverna workflow as a Distributed Annotation System (DAS) annotation source.– Design workflow in Taverna– Deploy in DALEC– Access through any DAS client (Spice, Ensembl web server etc)
Standard DAS Service DALEC DAS Service
Taverna 2Taverna 2
Funded as part of OMII-UK 10 Developers Dedicated design, implementation, testing
and support team First new developers started three weeks
ago, project manager arriving in April
Ingest Ingest
Early adoptersPioneers
Pioneers ConservativesEarly adoptersPioneers
myGridPre-release
myGrid Release
OMII-UKRelease
Software Engineering
XP
Software Engineering
Quality & Test
Evaluation Evaluation OMII Software Engineering
Quality & TestPrioritise & Plan
Prioritise & Plan
Production Applications & Professional ServicesApplications & Professional Services
myGridAlliance
myGridAlliance
Source-forgecommunity
Source-forgecommunity
Future DirectionFuture Direction
Enhancements to the Workflow Core Enhancements to user interface and
experience Expanded use of semantic web
technologies Engagement with new user communities –
cheminformatics, humanities, social sciences etc.
Code remains open source and always will
Composite Workflow ModelsComposite Workflow Models
Enhanced Dataflow ModelEnhanced Dataflow Model
Modular dispatcher mechanism– Dynamic service binding– Recursive invocation– Data filter implementation– Retry, failover, back-off behaviours
Transparent third party data transfers High throughput stream handling with
implicit iteration semantics
Runtime Service BindingRuntime Service Binding
Service definition consists of an abstract description
Resolved at workflow runtime to one or more concrete resources by a broker
Allows load balancing or economic model based service selection over grid environments
Recursive InvocationRecursive Invocation Dispatcher allowing
recursive invocation to be plugged into per operation semantics.
Test Forcompletion
Invokeoperation
ModifyInput Set
GatherResult Set
Return Result
ReceiveInput
Dynamic Dispatch ConfigurationDynamic Dispatch Configuration
33rdrd Party Data Transfers Party Data Transfers
Allows ‘in place’ referencing of data – Large data sets no longer round-trip between
workflow engine and data provider– Allows restricted access to sensitive data
Automatic de-reference when a reference type is linked to a value type within a workflow.– Connecting a grid service to a web service
Service 1 Service 2 Service 3
Service 1 Service 2
Provider A
Service 3
Provider B
Workflow Enactor
Enactment Engine
Logical Workflow Structure defined by user
Client pushes workflow input data value to workflow enactor, enactor stores the value in a local cache for future use.
Service 1Service 1 Service 2 Service 3
Service 1 Service 2
Provider A
Service 3
Provider B
Workflow Enactor
Enactment Engine
Logical Workflow Structure defined by user
Workflow enactor sends cached data value to Service 1.
Service 1Service 1 Service 2 Service 3
Service 1 Service 2
Provider A
Service 3
Provider B
Workflow Enactor
Enactment Engine
Logical Workflow Structure defined by user
Service 1 completes and stores its result value in a local data store, for example SRB, on the same host (Provider A). It returns a reference to that value to the workflow enactor.
Service 1 Service 2Service 2 Service 3
Service 1 Service 2
Provider A
Service 3
Provider B
Workflow Enactor
Enactment Engine
Logical Workflow Structure defined by user
The enactor examines the workflow and determines that Service 2 understands the reference it has to the Service 1 result. It sends this reference to Service 2 which uses it to directly access the local data store.
Service 1 Service 2Service 2 Service 3
Service 1 Service 2
Provider A
Service 3
Provider B
Workflow Enactor
Enactment Engine
Logical Workflow Structure defined by user
Service 2 completes, stores its result in the local store and returns a reference to that data to the enactor.
Service 1 Service 2 Service 3Service 3
Service 1 Service 2
Provider A
Service 3
Provider B
Workflow Enactor
Enactment Engine
Logical Workflow Structure defined by user
The enactor examines Service 3. This service, located on another provider, cannot consume the reference returned from Service 2. The enactor forces a de-reference, requesting and caching the value of that reference from Provider A
Service 1 Service 2 Service 3Service 3
Service 1 Service 2
Provider A
Service 3
Provider B
Workflow Enactor
Enactment Engine
Logical Workflow Structure defined by user
As the enactor now has a value rather than a reference it can invoke Service 3, which is fed data from the enactor local cache, operates over that data and returns a result which is in turn cached by the enactor.
Service 1 Service 2 Service 3
Service 1 Service 2
Provider A
Service 3
Provider B
Workflow Enactor
Enactment Engine
Logical Workflow Structure defined by user
The workflow is complete, the enactor sends the final result back to the client.
Streaming DataStreaming Data Allow execution of downstream workflow
stages on partially complete results from upstream.
Service 1 Service 2 Service 3
Non streaming (Taverna 1), entire iteration must complete at each stage
Streamed data, Service 2 starts operating on partial results from Service 1
New UI DevelopmentNew UI Development
Smart graph editing module 3d ‘virtual reality’ style enactment status
display Data playground – design workflows by
example Integrated semantic search Knowledge driven visualization for result
mining
KAVE Data and metadata KAVE Data and metadata managementmanagement
Life Science Identifiers Information Model File management Support for custom
database building Provenance metadata
capture using RDF SRB integration OGSA-DAI integration
urn:data:f2
urn:data:f2
urn:data1urn:data1
urn:data2urn:data2
urn:compareinvocation3urn:compareinvocation3
urn:data12
urn:data12
Blast_report
[input]
[output]
[input]
[distantlyDerivedFrom]
SwissProt_seq
[instanceOf]
Sequence_hit
[hasHits]
urn:hit2….
urn:hit2….
urn:hit1…urn:hit1…
urn:hit50…..
urn:hit50…..
[instanceOf]
[similar_sequence_to]
Data generated by services/workflows
Concepts
[ ]
[performsTask]
Find similar sequence
[contains]
Services
urn:data:3urn:data:3
urn:hit8….
urn:hit8….
urn:hit5…urn:hit5…
urn:hit10…..
urn:hit10…..
[contains]
[instanceOf]
urn:BlastNInvocation3urn:BlastNInvocation3
urn:invocation5urn:invocation5urn:data:f1
urn:data:f1
[output]
New sequence
Missed sequence
[hasName] [hasName
]
literalsDatumCollection
[type]
LSDatum
[type]Properties
[instanceOf]
[output]
[output]
[directlyDerivedFrom]
Process 1Process 2Process 3
Enactor
Workflow Workbench
Steering Control
Steering of simulations by
manipulation of service state
Workflow definition sent to enactor
myGrid Metadata Stores
Computational SteeringComputational Steering
Scientists
Process and data provenance captured and stored by metadata services
Scientist designs, initiates and steers simulation from Taverna
Workbench
Service TypesService Types
Closer integration with grid systems i.e. Condor, EGEE et al and their associated security and access control mechanisms.
R for numerical analysis (microarray informatics amongst others)
Continued improvements to SOAP, BioMoby, Biomart, Soaplab, SGS, Local scripting and other components
Obtaining TavernaObtaining Taverna
Taverna is available under the LGPL from our project site on Sourceforge.net– http://taverna.sourceforge.net
Release 1.3.1 as of December 2005 Win32, Solaris / Linux & OS-X Includes online and downloadable user manual,
examples etc. Support via project mailing lists
mymyGrid team & Early adoptersGrid team & Early adoptersCoreMatthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro Fernandes,
Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pockock Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe.
UsersSimon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical
Sciences, University of Newcastle, UKHannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UKPostgraduatesMartin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman, Keith Flanagan,
Antoon Goderis, Tracy Craddock, Alastair HampshireIndustrial Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM)Robin McEntire (GSK)CollaboratorsKeith Decker