the design and implementation of a workflow analysis tool

Post on 22-Feb-2016

40 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

The design and implementation of a workflow analysis tool. Vasa Curcin Department of Computing Imperial College London. Scientific workflow field. Scientific workflows : a high-level programming language with explicit graphical representation of flow of data and/or control - PowerPoint PPT Presentation

TRANSCRIPT

The design and implementation of a workflow analysis tool

Vasa CurcinDepartment of Computing

Imperial College London

Scientific workflow field• Scientific workflows: a high-level programming

language with explicit graphical representation of flow of data and/or control

• Research into automation of processes supporting scientific research

• Significant role in providing middleware for UK eScience programme: Taverna, Discovery Net, Triana

• Lingua franca of service-oriented computing

Deluge of workflows

Meandre

Taverna Discovery Net

Triana

Kepler

KNIME

Orange

Pentaho

Pegasus

TridentYAWL

BPEL

LONI

GenePatterns

Galaxy

VisTrails

UGENE

Wildfire

Bioinformatics

Cheminformatics Environmental Science

Business Intelligence

Astronomy

Sensor informatics

Workflow analysis• There is a need for formal models to capitalize on the

benefits of this infrastructureo Work evaluated on Discovery Net workflowo Concepts applicable to other workflow systems

• Some aimso Minimise cost of data movement and processingo Provide technology for workflow clients and warehouses

(indexing, guided construction…)• Tasks

o Safenesso Instance boundso Static workflow optimization o Establishing polymorphic type profiles of workflows

Underlying models• Control flow model

o Process calculus definitionso Communication along named channels

• Fixed for atomic execution, dynamic for streamingo New instance of the process launched as soon as the node receives a

tokeno Computational tree logic modelling execution states

• Data flow modelo Nodes associated with lambda calculus formulas and term graphso Polymorphic type transformationso Rewrite rules defined for sets of nodes as term graph transformations

• Embedding o Way of combining the control and data semantics

Workflow analysis tool• Similarity checker

o Bisimilarity of processes• Process profiler

o Deadlock/livelock detectiono Reachabilityo Task bounds

• Composability checkero Design-time testso Type requirementso Polymorphic properties

• Equivalence checkero Functional equivalence

• Optimizero Rewrite rules for transformations

Similarity checker

• Based purely on the pi-calculus process modelo Workflows translated into the process modelo Parallel composition of independent node processes with named channelso Compared in terms of:

• Internal executions (node actions)• Set of observable outputs - define only relevant outputs

• Model checker used to test different types of bisimilarityo Node executions conveniently represented as silent actionso Strong bisimulation becomes strict one-to-one workflow action mappingo Weak bisimulation ignores internal actions and communications and

focuses on visible outputs

Workflow Process model

Model checker

Similarity checker: example

• ABC (Another Bisimilarity Checker) used• Model checker used to test different types of bisimilarity

o Node executions conveniently represented as silent actionso Strong bisimulation becomes strict one-to-one workflow action mappingo Weak bisimulation ignores internal actions and communications and

focuses on visible outputs

Process profiling

• The process algebra representation translated into a Kripke frameo Enumerated states denoting the number of instances of each workflow

nodeo Transitions of the frame are the node executionso Use CTL formulas to queryo NuSMV model checker employed

• Allows questions such as:o Reachability of a particular stateo Detection of deadlocks and livelockso Safety - some state always executingo Bounds on a number of instances of a node

Workflow Process model

Kripke frame

Process profiling: example

• Reachabilityo EF Fτ

1 – Is there an execution that achieves one instance of Fo AF Fτ

1 – Do all executions always achieve one instance of F

• Livelockso AG (Cτ

-> AG AF Cτ) – Is there always a livelock with Co EF (Cτ

-> AG AF Cτ) – Can there be a livelock with C

• Instance boundso maxX .EF Aτ

x – What is the maximum number of instances of A

Composability checker

• Polymorphic type formulas for the workflow components/fragments

• When composing:o The output and input of each fragment compared in terms of free and

bound type variableso If no clashes, free variables resolved to form the type formula of the

compositiono Inference engine developed specifically for the tool

• Determines:o If a workflow fragment can be reused on a new inputo Find compatible services in the warehouse

Workflow Data model

Type formulas

Composability checker: example

• Fragment of three nodes LMNo Input q, with required attributes A, B, Do Two outputs u, vo A present in both. B in u. D in neither.

• Two outputs can be joined with O

Equivalence tester / optimizer

• Uses a set of node equivalence ruleso Defined for each workflow system or node subseto Algorithm applies allowed transformations to reduce

two workflows to the same expression• Combined with rewrite heuristics

o Node-specific againo Simple example: relational model again

Workflow Data model

Node equivalences

Equivalence tester/optimizer: example

• Relational workflow searching for Adverse Drug Reactions in GPRD database• Rewrite rules

o Set of relational equivalences• Heuristics

o Early projections/selectionso Late joinso Easy scenario – brute force algorithm works

Related and future work• Data typing

o COMAD for Kepler• Workflow process analysis

o GWorkflowDLo YAWL

• New workflow tools with relational structureso KNIMEo Orangeo Pentaho

• Extensions:o Streaming – blocking and batchingo Improved state reduction algorithms for CTL modelo Adding more type constructs for polymorphism

Summary• Workflow analysis needed to improve takeup and

exploitation of workflowso Enterprise environmentso Profile resource usage, risk of failure, execution timeo Support reuse and repurposing

• Separation of control and data aspects allows use of existing model checkers and familiar techniqueso Process algebras, temporal logics, type polymorphisms,

term graphs• Current version works on Discovery Net/InforSense

o KNIME, Pentaho very similar – only require extra parserso Full streaming process model for Taverna in the works

top related