Transcript
Page 1: Paper presentation: Taverna, reloaded

SSDBMHeidelberg, GermanyJune 30 - July 2, 2010

, reloaded

Paolo Missier, Stian Soiland-Reyes, Stuart Owen, Alex Nenadic,Ian Dunlop, Alan Williams, Carole Goble

School of Computer ScienceUniversity of Manchester, UK

Wei TanArgonne National Laboratory

Argonne, IL, USA

Tom OinnEBI, UK

Page 2: Paper presentation: Taverna, reloaded

Taverna: scientific workflow management

2http://www/myexperiment.org/workflows/746

Goal: perform cancer diagnosis using microarray analysis- learn a model for lymphoma type prediction based on samples from different lymphoma types

source: caGrid

Page 3: Paper presentation: Taverna, reloaded

Taverna: scientific workflow management

2http://www/myexperiment.org/workflows/746

Goal: perform cancer diagnosis using microarray analysis- learn a model for lymphoma type prediction based on samples from different lymphoma types

lymphoma samples ➔ hybridization datasource: caGrid

Page 4: Paper presentation: Taverna, reloaded

Taverna: scientific workflow management

2http://www/myexperiment.org/workflows/746

Goal: perform cancer diagnosis using microarray analysis- learn a model for lymphoma type prediction based on samples from different lymphoma types

lymphoma samples ➔ hybridization data

process microarray data as training dataset

source: caGrid

Page 5: Paper presentation: Taverna, reloaded

Taverna: scientific workflow management

2http://www/myexperiment.org/workflows/746

Goal: perform cancer diagnosis using microarray analysis- learn a model for lymphoma type prediction based on samples from different lymphoma types

lymphoma samples ➔ hybridization data

process microarray data as training dataset

learn predictive model

source: caGrid

Page 6: Paper presentation: Taverna, reloaded

Taverna: scientific workflow management

2http://www/myexperiment.org/workflows/746

Goal: perform cancer diagnosis using microarray analysis- learn a model for lymphoma type prediction based on samples from different lymphoma types

lymphoma samples ➔ hybridization datasource: caGrid

Page 7: Paper presentation: Taverna, reloaded

• Taverna for data-intensive science– architectural goals... and how they are being achieved

• Performance figures on benchmark workflows

3

In this (short) talk...

Page 8: Paper presentation: Taverna, reloaded

Target -ilities

4

• Scalability– size of input data / size of input collections

• Configurability

– each processor in the workflow individually tunable to adjust to the underlying platform

• Extensibility– plugin architecture to accommodate specific service

groups• caGrid, BioMoby

– extensible set of operators that define the semantics of the model

Page 9: Paper presentation: Taverna, reloaded

Opportunities for parallel processing

5

[ id1, id2, id3, ...]

• intra-processor: implicit iteration over list data• inter-processor: pipelining

Page 10: Paper presentation: Taverna, reloaded

Opportunities for parallel processing

5

[ id1, id2, id3, ...]

• intra-processor: implicit iteration over list data• inter-processor: pipelining

SFH1 SFH2 SFH3

... ... ...

getDS1 getDS2 getDS3

Page 11: Paper presentation: Taverna, reloaded

Opportunities for parallel processing

5

[ id1, id2, id3, ...]

• intra-processor: implicit iteration over list data• inter-processor: pipelining

[ DS1, DS2, DS3, ...]

SFH1 SFH2 SFH3

... ... ...

getDS1 getDS2 getDS3

Page 12: Paper presentation: Taverna, reloaded

Default dispatch stack

Parallelise

Error bounce

Failover

Stop/pause

Retry

Request

Response

Provenance

Invoke

Configurable processor behaviour

allocates concurrent threads to list elements

captures service invocation events

persistent server error:tries each available service implementation in turn

transient errors:retries one service invocation

interacts with underlying activity- service operation- shell script execution

Page 13: Paper presentation: Taverna, reloaded

Extensible processor semantics

7

Default dispatch stack

Parallelise

Error bounce

Failover

Stop/pause

Retry

Request

Response

Provenance

Invoke

Page 14: Paper presentation: Taverna, reloaded

Extensible processor semantics

7

Default dispatch stack

Parallelise

Error bounce

Failover

Stop/pause

Retry

Request

Response

Provenance

Invoke

Default dispatch stack

Parallelise

Error bounce

Failover

Invoke

Retry

Re

qu

es

t

(a) (b)

Re

sp

on

se

Re

qu

es

t

Re

sp

on

se

Extended dispatch stack

Parallelise

Error bounce

Failover

Invoke

Retry

Loop/Branching

Enhancing dataflow model with limited form of loops to model busy waiting(asynchronous service interaction)

Page 15: Paper presentation: Taverna, reloaded

Extensible processor semantics

7

Default dispatch stack

Parallelise

Error bounce

Failover

Stop/pause

Retry

Request

Response

Provenance

Invoke

Default dispatch stack

Parallelise

Error bounce

Failover

Invoke

Retry

Re

qu

es

t

(a) (b)

Re

sp

on

se

Re

qu

es

t

Re

sp

on

se

Extended dispatch stack

Parallelise

Error bounce

Failover

Invoke

Retry

Loop/Branching

Enhancing dataflow model with limited form of loops to model busy waiting(asynchronous service interaction)

!"#$#%&'()(*+,-%$.*//$0*1

2*(34/*5678&.8&9

2*(34/*56#%.8&9

!"#$%&'&(%

!"#$)*

+"#,-

./0.12&'&(%

%&'&(% '&&'./304&5,%&

+"#$%&'&(%

)%$-"40

,%$-"40

&0%&

2(..0%%

+"#,- &670

80&$9:5$;0%(<&

;0%(<& '&&'./304&5,%&

."4&04& 7';'3%

;(4)4&0;=;"2.'4

'&&'./304&5,%& +"#,-

Busy waiting using an explicit workflow pattern

Page 16: Paper presentation: Taverna, reloaded

Extensible processor semantics

7

Default dispatch stack

Parallelise

Error bounce

Failover

Stop/pause

Retry

Request

Response

Provenance

Invoke

Default dispatch stack

Parallelise

Error bounce

Failover

Invoke

Retry

Re

qu

es

t

(a) (b)

Re

sp

on

se

Re

qu

es

t

Re

sp

on

se

Extended dispatch stack

Parallelise

Error bounce

Failover

Invoke

Retry

Loop/Branching

Enhancing dataflow model with limited form of loops to model busy waiting(asynchronous service interaction)

!"#$#%&'()(*+,-%$.*//$0*1

2*(34/*5678&.8&9

2*(34/*56#%.8&9

!"#$%&'&(%

!"#$)*

+"#,-

./0.12&'&(%

%&'&(% '&&'./304&5,%&

+"#$%&'&(%

)%$-"40

,%$-"40

&0%&

2(..0%%

+"#,- &670

80&$9:5$;0%(<&

;0%(<& '&&'./304&5,%&

."4&04& 7';'3%

;(4)4&0;=;"2.'4

'&&'./304&5,%& +"#,-

Busy waiting using an explicit workflow pattern

!"#$% &'()

*)&+,-.+/)012&

/)012& 3&&3456)7&.$0&

(3/360 4"7&)7&

/1787&)/9/":437

3&&3456)7&.$0& !"#$%

!"#$%

45)4;:&3&10

3&&3456)7&.$0& 0&3&10

Busy waiting using an loop processor

Page 17: Paper presentation: Taverna, reloaded

Performance experimental setup• previous version of Taverna engine used as baseline• objective: to measure incremental improvement

8

mul

tiple

par

alle

l pip

elin

es

list generatorParameters:- byte size of list elements (strings)- size of input list- length of linear chain

main insight: when the workflow is designed for pipelining, parallelism is exploited effectively

Page 18: Paper presentation: Taverna, reloaded

I - Memory usage

9

T2 main memory data management

T2 main embedded Derby back-end

T1 baseline

list size: 1,000 strings of 10K chars eachno intra-processor parallelism (1 thread/processor)

shorter execution time due to pipelining

Page 19: Paper presentation: Taverna, reloaded

I - Memory usage

9

T2 main memory data management

T2 main embedded Derby back-end

T1 baseline

list size: 1,000 strings of 10K chars eachno intra-processor parallelism (1 thread/processor)

shorter execution time due to pipelining

Page 20: Paper presentation: Taverna, reloaded

Available processors pool

10

pipelining in T2 makes up for smaller pools of threads/processor

Page 21: Paper presentation: Taverna, reloaded

Bounded main memory usage

11

Separation of data and process spaces ensures scalable data management

varying data element size:10K, 25K, 100K chars

Page 22: Paper presentation: Taverna, reloaded

Summary• Taverna 2 re-engineered for scalability on data-

intensive pipelined dataflows

• separation of data and process spaces• generic stack pattern for principled extensions

– limited while loop, if-then-else– provenance capture– ... open plugin architecture accommodates further

extensions

12

For further details and a nice chat:please visit our poster!!


Top Related