the design and implementation of a workflow analysis tool

16
The design and implementation of a workflow analysis tool Vasa Curcin Department of Computing Imperial College London

Upload: chinue

Post on 22-Feb-2016

40 views

Category:

Documents


0 download

DESCRIPTION

The design and implementation of a workflow analysis tool. Vasa Curcin Department of Computing Imperial College London. Scientific workflow field. Scientific workflows : a high-level programming language with explicit graphical representation of flow of data and/or control - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: The design and implementation of a workflow analysis tool

The design and implementation of a workflow analysis tool

Vasa CurcinDepartment of Computing

Imperial College London

Page 2: The design and implementation of a workflow analysis tool

Scientific workflow field• Scientific workflows: a high-level programming

language with explicit graphical representation of flow of data and/or control

• Research into automation of processes supporting scientific research

• Significant role in providing middleware for UK eScience programme: Taverna, Discovery Net, Triana

• Lingua franca of service-oriented computing

Page 3: The design and implementation of a workflow analysis tool

Deluge of workflows

Meandre

Taverna Discovery Net

Triana

Kepler

KNIME

Orange

Pentaho

Pegasus

TridentYAWL

BPEL

LONI

GenePatterns

Galaxy

VisTrails

UGENE

Wildfire

Bioinformatics

Cheminformatics Environmental Science

Business Intelligence

Astronomy

Sensor informatics

Page 4: The design and implementation of a workflow analysis tool

Workflow analysis• There is a need for formal models to capitalize on the

benefits of this infrastructureo Work evaluated on Discovery Net workflowo Concepts applicable to other workflow systems

• Some aimso Minimise cost of data movement and processingo Provide technology for workflow clients and warehouses

(indexing, guided construction…)• Tasks

o Safenesso Instance boundso Static workflow optimization o Establishing polymorphic type profiles of workflows

Page 5: The design and implementation of a workflow analysis tool

Underlying models• Control flow model

o Process calculus definitionso Communication along named channels

• Fixed for atomic execution, dynamic for streamingo New instance of the process launched as soon as the node receives a

tokeno Computational tree logic modelling execution states

• Data flow modelo Nodes associated with lambda calculus formulas and term graphso Polymorphic type transformationso Rewrite rules defined for sets of nodes as term graph transformations

• Embedding o Way of combining the control and data semantics

Page 6: The design and implementation of a workflow analysis tool

Workflow analysis tool• Similarity checker

o Bisimilarity of processes• Process profiler

o Deadlock/livelock detectiono Reachabilityo Task bounds

• Composability checkero Design-time testso Type requirementso Polymorphic properties

• Equivalence checkero Functional equivalence

• Optimizero Rewrite rules for transformations

Page 7: The design and implementation of a workflow analysis tool

Similarity checker

• Based purely on the pi-calculus process modelo Workflows translated into the process modelo Parallel composition of independent node processes with named channelso Compared in terms of:

• Internal executions (node actions)• Set of observable outputs - define only relevant outputs

• Model checker used to test different types of bisimilarityo Node executions conveniently represented as silent actionso Strong bisimulation becomes strict one-to-one workflow action mappingo Weak bisimulation ignores internal actions and communications and

focuses on visible outputs

Workflow Process model

Model checker

Page 8: The design and implementation of a workflow analysis tool

Similarity checker: example

• ABC (Another Bisimilarity Checker) used• Model checker used to test different types of bisimilarity

o Node executions conveniently represented as silent actionso Strong bisimulation becomes strict one-to-one workflow action mappingo Weak bisimulation ignores internal actions and communications and

focuses on visible outputs

Page 9: The design and implementation of a workflow analysis tool

Process profiling

• The process algebra representation translated into a Kripke frameo Enumerated states denoting the number of instances of each workflow

nodeo Transitions of the frame are the node executionso Use CTL formulas to queryo NuSMV model checker employed

• Allows questions such as:o Reachability of a particular stateo Detection of deadlocks and livelockso Safety - some state always executingo Bounds on a number of instances of a node

Workflow Process model

Kripke frame

Page 10: The design and implementation of a workflow analysis tool

Process profiling: example

• Reachabilityo EF Fτ

1 – Is there an execution that achieves one instance of Fo AF Fτ

1 – Do all executions always achieve one instance of F

• Livelockso AG (Cτ

-> AG AF Cτ) – Is there always a livelock with Co EF (Cτ

-> AG AF Cτ) – Can there be a livelock with C

• Instance boundso maxX .EF Aτ

x – What is the maximum number of instances of A

Page 11: The design and implementation of a workflow analysis tool

Composability checker

• Polymorphic type formulas for the workflow components/fragments

• When composing:o The output and input of each fragment compared in terms of free and

bound type variableso If no clashes, free variables resolved to form the type formula of the

compositiono Inference engine developed specifically for the tool

• Determines:o If a workflow fragment can be reused on a new inputo Find compatible services in the warehouse

Workflow Data model

Type formulas

Page 12: The design and implementation of a workflow analysis tool

Composability checker: example

• Fragment of three nodes LMNo Input q, with required attributes A, B, Do Two outputs u, vo A present in both. B in u. D in neither.

• Two outputs can be joined with O

Page 13: The design and implementation of a workflow analysis tool

Equivalence tester / optimizer

• Uses a set of node equivalence ruleso Defined for each workflow system or node subseto Algorithm applies allowed transformations to reduce

two workflows to the same expression• Combined with rewrite heuristics

o Node-specific againo Simple example: relational model again

Workflow Data model

Node equivalences

Page 14: The design and implementation of a workflow analysis tool

Equivalence tester/optimizer: example

• Relational workflow searching for Adverse Drug Reactions in GPRD database• Rewrite rules

o Set of relational equivalences• Heuristics

o Early projections/selectionso Late joinso Easy scenario – brute force algorithm works

Page 15: The design and implementation of a workflow analysis tool

Related and future work• Data typing

o COMAD for Kepler• Workflow process analysis

o GWorkflowDLo YAWL

• New workflow tools with relational structureso KNIMEo Orangeo Pentaho

• Extensions:o Streaming – blocking and batchingo Improved state reduction algorithms for CTL modelo Adding more type constructs for polymorphism

Page 16: The design and implementation of a workflow analysis tool

Summary• Workflow analysis needed to improve takeup and

exploitation of workflowso Enterprise environmentso Profile resource usage, risk of failure, execution timeo Support reuse and repurposing

• Separation of control and data aspects allows use of existing model checkers and familiar techniqueso Process algebras, temporal logics, type polymorphisms,

term graphs• Current version works on Discovery Net/InforSense

o KNIME, Pentaho very similar – only require extra parserso Full streaming process model for Taverna in the works