Download - Workflow management within DIET Raphaël Bolze LIP ENS Lyon, CNRS INRIA Rhône-Alpes, GRAAL project
Workflow management within DIET
Raphaël BolzeLIP ENS Lyon, CNRSINRIA Rhône-Alpes,
GRAAL projecthttp://graal.ens-lyon.fr
R. Bolze – 19 oct 2006 Edinburgh
2
Introduction
• Distributed Interactive Engineering Toolbox RPC and grid-computing : gridRPC DIET goals DIET environment & architecture Request management Research topics & features
• DIET and workflow management Needs Language Architectures Scheduling propose
• Target applications PipeAlign Docking Robinson Cosmology
• Current works
Distributed Interactive Engineering Toolbox
R. Bolze – 19 oct 2006 Edinburgh
4
RPC and Grid-Computing: GridRPC
• One simple idea One simple (and efficient) paradigm for grid computing: offering (or leasing)
computational power and/or storage capacity through the Internet One simple solution: implementing the RPC programming model over the
Grid – Using resources accessible through the network– Mixed parallelism model (data-parallel model at server level and task
parallelism between servers)
• Features needed– Load-balancing (resource localization and performance evaluation,
scheduling), – Data and replica management, – Security, – Fault-tolerance, – Interoperability with other systems,– …
• Design of a standard interface – within the GGF/OGF (GridRPC WG, C. Lee)– www.ogf.org, forge.gridforum.org/projects/gridrpc-wg– Existing implementations: GridSolve, Ninf, DIET, XtremWeb
R. Bolze – 19 oct 2006 Edinburgh
5
RPC and Grid Computing: Grid RPC
AGENT(s)
S1 S2 S3 S4
A, B, C
Answer (C)
S2 !
Request
Op(C, A, B)
Client
R. Bolze – 19 oct 2006 Edinburgh
DIET’s Goals
• Our goals To develop a toolbox for the deployment of environments using the Application
Service Provider (ASP) paradigm with different applications Use as much as possible public domain and standard software To obtain a high performance and scalable environment Implement and validate our more theoretical results
Scheduling for heterogeneous platforms, data (re)distribution and replication, performance evaluation, algorithmic for heterogeneous and distributed platforms, …
• Based on CORBA, NWS, LDAP, and our own software developments CoRI for performance evaluation,
FAST CoRI-easy
LogService for monitoring, VizDIET for the visualization, GoDIET for the deployment
• Several applications in different fields (simulation, bioinformatic, cosmological application…)
• Release 2.1 available on the web
• Release 2.2 coming soon
http://graal.ens-lyon.fr/DIET/
R. Bolze – 19 oct 2006 Edinburgh
7
DIET Environment
CLIENT
SequentialApplication
Data managementApplication
Parallel Application
C C
CC
C
C CC C
AA AS S S
A S S S
A
AA
A
A
A
A
R. Bolze – 19 oct 2006 Edinburgh
8
DIET Architecture
LA
MA
LA
LA
ServerDeamons
Master Agent
Local Agent
Client
LA
R. Bolze – 19 oct 2006 Edinburgh
9
Requests Management
agent
agent
serverestimate() { predExecTime(…);}
FindServer()
FindServer()
FindServer()
Aggregate() { min(…);}
Aggregate() { min(…);}
bestServer = S3
runService(…);
R. Bolze – 19 oct 2006 Edinburgh
10
Research Topics
• Scheduling Distributed scheduling
Plug-in schedulers
• Data-management Scheduling of computation requests and links with data-management
Replication, data prefetching
• Deployment Mapping components on available (selected) resources
Software platform deployment with or without dynamic connections between components
• Performance evaluation Application modeling
Dynamic information about the platform (network, clusters)
• Fault Tolerance Failure Detection
Application recovery …
Scheduling
R. Bolze – 19 oct 2006 Edinburgh
12
DIET Scheduling
• SeD level Performance estimation function Estimation metric vector (estVector_t) - dynamic collection of performance
estimation values Performance measures available through DIET
FAST-NWS performance metrics Time elapsed since the last execution CoRI (Collector of Resource Information)
Developer defined values Standard estimation tags for accessing the fields of an estVector_t
EST_FREEMEM EST_TCOMP EST_TIMESINCELASTSOLVE EST_FREECPU
• Aggregation Methods Defining mechanism how to sort SeD responses: associated with the service
and defined at SeD level Tunable comparison/aggregation routines for scheduling Priority Scheduler
Performs pairwise server estimation comparisons returning a sorted list of server responses;
Can minimize or maximize based on SeD estimations and taking into consideration the order in which the request for those performance estimations was specified at SeD level.
R. Bolze – 19 oct 2006 Edinburgh
13
DIET Scheduling
• Collector of Resource Information (CoRI)• CoRI-Easy – provides basic measurements of the environment• CoRI Manager – manage the use of different collectors
CoRI-Easy Collector
FAST Collector
CoRI Manager
Other Collectors
like Ganglia
FAST Software
Data management
R. Bolze – 19 oct 2006 Edinburgh
15
Data/replica management
• Two needs Keep the data in place to reduce the overhead of communications between
clients and servers Replicate data whenever possible
• Two approaches for DIET DTM (LIFC, Besançon)
Hierarchy similar to the DIET’s one Distributed data manager Redistribution between servers
JuxMem (Paris, Rennes) P2P data cache
• Work done within the GridRPC Working Group (OGF) Relations with workflow management
Client
A
F
G
Client
Y
Server 1
Server 2
X
B
B
B
R. Bolze – 19 oct 2006 Edinburgh
16
Data management with DTM within DIET
• Persistence at the server level • To avoid useless data transfers
Intermediate results Between clients and servers Between servers “transparent” for the client
• Data Manager/Loc Manager Hierarchy mapped on the DIET one modularity
• Proposition to the Grid-RPC WG (OGF) Data handles Persistence flag Data management functions
R. Bolze – 19 oct 2006 Edinburgh
17
JUXMEM
• A peer-to-peer architecture for a data-sharing service in memory
• Persistence and data coherency mechanism• Transparent data localization
PARIS project, IRISA, France
Peer
Peer
Peer Peer
Peer
PeerPeer
Peer
PeerPeer
Peer
Peer
FirewallPeer
PeerTCP/IP
HTTP
Peer ID
Peer ID
Peer ID
Peer ID
Peer ID
Peer ID
Peer ID
Peer ID
Firewall
Toolbox for the development of P2P applications
Set of protocols
One peer Unique ID Several communication
protocols (TCP, HTTP, …)
Deployment and visualization
R. Bolze – 19 oct 2006 Edinburgh
19
Deployment Management
XML:- Resources
- Machines- Storage
- DIET hierarchy
Distributed deploymentof DIET
LogServiceGoDIET
VizDIET
DIETAdministration Traces
Trace subset
TraceSubset
R. Bolze – 19 oct 2006 Edinburgh
20
VizDIET
Workflow management
R. Bolze – 19 oct 2006 Edinburgh
22
Workflow Management : needs ?
Workflow representation : Direct Acyclic Graph (DAG)
• Each vertex is a tasks• Each directed edge
represents communication between tasks
Questions : Ordering problem ? Mapping problem ?
R. Bolze – 19 oct 2006 Edinburgh
23
Workflow Management : goals
Goals Build and execute workflow Use different heuristic methods to solve scheduling problems Extensibility to address mutli-workflows submission and large
grid platform Manage heterogeneity and variability of environment
R. Bolze – 19 oct 2006 Edinburgh
24
Workflow Management : existing languages ?
Workflows languages: No standard (XML, scripts) Exemples :
• Condor DAGman : script• Pegasus : DAX (xml)• Taverna : XScuffl (xml)
2 levels of description :• Abstract : application description• Concrete : execution description
R. Bolze – 19 oct 2006 Edinburgh
25
Workflow Management
Workflow description in DIET Xml format DIET profile : problem (id), parameters (in, inout ,out) Description of tasks and data dependency
<!-- NORMD 2 --> <node id="normd2" path="normd">
<in name="in_file" type="DIET_FILE" source="rascal1#out_file" /> <out name="normd_value" type="DIET_FLOAT" /> <out name="srv_time" type="DIET_DOUBLE" /> <prec id="rascal1" />
</node>
<!-- LEON 1 --> <node id="leon1" path="leon">
<arg name="protein_name" type="DIET_STRING" value="P07942" /> <in name="clustalw_file" type="DIET_FILE" source="clustalw1#out_file" /> <in name="rascal_file" type="DIET_FILE" source="rascal1#out_file" /> <in name="clustalw_normd" type="DIET_FLOAT" source="normd1#normd_value" /> <in name="rascal_normd" type="DIET_FLOAT" source="normd2#normd_value" /> <out name="srv_time" type="DIET_DOUBLE" /> <out name="out_file1" type="DIET_FILE" /> <out name="out_file2" type="DIET_FILE" /> <prec id="normd2" />
</node>
R. Bolze – 19 oct 2006 Edinburgh
26
Workflow Management : architecture
2 Architectures :
Meta scheduler in the client side
Meta scheduler distributed in the client and in the MA-DAG
R. Bolze – 19 oct 2006 Edinburgh
27
Workflow Management : Meta scheduler : client
Architecture 1 : Meta scheduler in the client side
Client MA
LA LA LA
SeD
SeD
SeD SeD
SeD
R. Bolze – 19 oct 2006 Edinburgh
28
Workflow management : Meta scheduler : client
• Disadvantages : No coordination between the different clients Depends on client capability
• Benefits : More flexible for evolution :
Client can use his own algorithm. More scalable, depends on client capability.
R. Bolze – 19 oct 2006 Edinburgh
29
Workflow management
Architecture 2 : Meta scheduler distributed in the client and in the MA-DAG
Client
MA
LA LA LA
SeD
SeD
SeD SeD
SeD
MA DAG
R. Bolze – 19 oct 2006 Edinburgh
30
Workflow management - Meta scheduler
• Base Scheduler : No ranking, respect the
topological order of the DAG HEFT heuristic
• Flexibility : Architecture 1 :
Client can have his own schedule No needs to re-build the platform
Architecture 2 : Schedulers are define at the
compile time. Needs to re-build the platform if
some decide the change.
Abstract Workflow Scheduler
Virtual void execute();
Virtual void reSchedule();
User defined Scheduler
Virtual void execute();
Virtual void reSchedule();
Target applications
R. Bolze – 19 oct 2006 Edinburgh
32
Docking Application
Detection of protein-protein and protein-DNA interactions. Screening a database containing thousands of proteins for
functional sites involved in binding to other proteins, DNA or ligand targets.
docking
merge
params
docking docking docking
R. Bolze – 19 oct 2006 Edinburgh
33
PipeAlign Application
The sequence-to-function relationship can be understood through the analysis of conserved patterns and evolution of protein organization mainly based on amino acid sequence comparisons in the context of the multiple alignments.
blastall
ballast
filtering
clustalw
normdrascal
normd
leon
normd
R. Bolze – 19 oct 2006 Edinburgh
34
Robinson application
This application annotate human genes according to their expression in neurological or muscular tissues, but also to the expression of their homolog other species.
extract extract extract extract extract extract
Build DB
blastall blastall blastall blastall
R. Bolze – 19 oct 2006 Edinburgh
35
Cosmology application
rollWhiteNoise
Grapfic1
Grapfic1
Grapfic1 Grapfic1 Grapfic1 Grapfic1
Grapfic2 Grapfic2
Grapfic2
Grapfic2
Grapfic2
Grapfic2
Grapfic2
Grapfic2
Grapfic2
Grapfic2
Ramses3D
HaloMaker
TreeMaker + GalaxyMaker
HaloMaker HaloMaker HaloMaker
• Simulate the evolution of dark matter particles during time to compare it to the real observation.
Centre de Recherche en Astronomie de Lyon
Current Work
R. Bolze – 19 oct 2006 Edinburgh
37
Multi-Workflow
Deal with multiple workflow submission On-line scheduling, different submission time Implements “fair” scheduling strategies Implements specific scheduling heuristics Distribute the workflow management
? grid
R. Bolze – 19 oct 2006 Edinburgh
38
Multi-Workflow
Simulations
Real experiments on Grid’5000
R. Bolze – 19 oct 2006 Edinburgh
39
Conclusion
DIET Workflow enabled
Data management : DTM, JuXMEM Performance information : CoRI, FAST Plugin schedulers
Multi-Applications
Questions ?
http://graal.ens-lyon.fr