ewa deelman, [email protected] integrating existing scientific workflow systems:...
TRANSCRIPT
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Integrating Existing Scientific Workflow Systems:
The Kepler/Pegasus Example
Nandita Mangal, Ewa Deelman, Gaurang Mehta, Mei-Hui Su, Karan VahiUSC Information Sciences Institute
Marina del Rey, CA
pegasus.isi.edu
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Motivation Many workflow systems exists today The choice of particular system often dictated
by who you know Various workflow system have different
capabilities Application components versus services Visual vs. scripting workflow descriptions Performance optimization, etc.
Can you combine two separate systems? What are the issues?
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Kepler (UCSD and UCDavis)
Scientific workflow management system based on Ptolemy II
Allows scientists to visually design and execute scientific workflows
Actor-oriented model with directors acting as the main workflow engine
Enables different models of computation.
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Pegasus (USC/ISI)
Based on programming language principles Leverages abstraction for workflow description to obtain
ease of use, scalability, and portability Provides a compiler to map from high-level descriptions to
executable workflows Correct mapping Performance enhanced mapping
Relies on a runtime engine to carry out the instructions Scalable manner Reliable manner
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Combing Kepler & Pegasus
Integration of Kepler visual programming environment with the grid mapping abilities of Pegasus
Giving Kepler users the ability to map their large workflows onto the grid
Giving Pegasus users a visual workflow composition tool
Differences in the level of abstraction of workflow description
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Kepler Provenance Challenge Workflow
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Concrete Workflow Generation and Mapping
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Implementation Strategy Develop Pegasus-specific entities
Abstract jobs Directors and actors
“Pegasus Director” and “Pegasus Jobs (Actor Entities)” act as the main grid components to execute a given grid
computation Focus mainly on abstract jobs in the Kepler
environment portable and resources-knowledge independent workflow
descriptions
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Kepler
Pegasus DAGMan
Abstract Workflow in DAX
format
ExecutableWorkflow
TasksDistributed Environment
Monitoring Information
Transformation Catalog
Data Registry
Integration
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Pegasus Actor & Director Entities
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Visual Abstract Workflow Creation
Users can create visual models of abstract workflows and specify logical transformations without specifying grid resources
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Job Abstract Configuration--Integration with the Transformation Catalog
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Resultant Abstract Job on Kepler Canvas:
A Pegasus abstract job can take in multiple input files as can output multiple output file
Grid resources information is not expected in such an actor.
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Support for Concrete Jobs--- useful for monitoring and debugging
A concrete job requires specific grid resources information from the scientist.
Allows the scientist to directly execute jobs on the grid
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Pegasus Director/ DAX Generator
Controls the execution of all the job (actor) entities and creates a resulting directed acyclic graph in XML format
Generates a DAX Gives it to DAGMan for execution
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Sample DAX Generated :
<?xml version="1.0" encoding="UTF-8"?> <!-- generated: 2006-12-03T19:27:27-08:00 --> <!-- generated by: Nandita [??] --> <adag xsi:schemaLocation="http://www.griphyn.org/chimera/DAX http://www.griphyn.org/chimera/dax-1.10.xsd" xmlns="http://www.griphyn.org/chimera/DAX" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.10" count="1" index="0" name="WorkflowTemplate_1_1156808715390"> <!-- part 1: list of all referenced files (may be empty) --> <!-- part 2: definition of all jobs (at least one) --> <job id="Job1_id" namespace="keplerdax" name="Job1" version="1"> <argument><filename file="FileA.png"/> <filename file="FileB.txt"/></argument> <uses file="FileA.png" link="input"/> <uses file="FileB.txt" link="output"/> <uses file="FileC.txt" link="output"/> </job> <job id="Job2_id" namespace="keplerdax" name="Job2" version="1"> <argument><filename file="FileB.txt"/> <filename file="FileE.png"/></argument> <uses file="FileB.txt" link="input"/> <uses file="FileE.png" link="output"/> </job>
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
<job id="Job3_id" namespace="keplerdax" name="Job3" version="1"> <argument><filename file="FileC.txt"/><filename file="FileD.xml"/></argument> <uses file="FileC.txt" link="input"/> <uses file="FileD.xml" link="output"/> </job> <job id="Job4_id" namespace="keplerdax" name="Job4" version="1"> <argument><filename file="FileE.png"/> <filename file="FileF.png"/></argument> <uses file="FileE.png" link="input"/> <uses file="FileD.xml" link="input"/> <uses file="FileF.png" link="output"/> </job> <!-- part 3: list of control-flow dependencies (may be empty) --> <child ref="Job2_id"> <parent ref="Job1_id"/> </child> <child ref="Job3_id"> <parent ref="Job1_id"/> </child> <child ref="Job4_id"> <parent ref="Job2_id"/> </child> <child ref="Job4_id"> <parent ref="Job3_id"/> </child> </adag>
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Provenance Challenge Workflow in Kepler/Pegasus
In Kepler each node needs a unique name, so TC needs many duplicate entries
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Integration Benefit for Pegasus users
Visualizing/ Debugging Existing Models: Support a scientist trying to redo/visualize or easily
re-configure existing DAX
Provide option to upload existing DAX files into the workspace
Convert the specified DAX file into a MoML (Kepler’s) format by passing it through an XSLT processor and generating the required directors and actors on the canvas
Issues of scalability (only small workflows can be visualized) Scoping may need to be applied
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Integration Issues
Kepler acts a visual programming environment Actors represent single units of computation with data flow
among each other Some configuration not intuitive (TC entries)
There is no concept of representation of files separately in Kepler Have multiport I/O ports for each job The user is given the option to connect as many files going into and
coming out of the port
Potential use of integrated environment for debugging
Not done Integration with Pegasus data registry No monitoring of execution in Kepler Use of Kepler’s workflow execution engine Support for Kepler actors in Pegasus
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu
Relevant Links
Kepler: http://kepler-project.org Pegasus: http://pegasus.isi.edu DAGMan: www.cs.wisc.edu/condor/dagman/ Provenance challenge:
http://twiki.ipaw.info/bin/view/Challenge/ Workshop on Tuesday
NSF workshop on Challenges of Scientific Workflows: www.isi.edu/nsf-workflows06