ewa deelman, [email protected] integrating existing scientific workflow systems:...

21
Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu Integrating Existing Scientific Workflow Systems: The Kepler/Pegasus Example Nandita Mangal, Ewa Deelman, Gaurang Mehta, Mei-Hui Su, Karan Vahi USC Information Sciences Institute Marina del Rey, CA pegasus.isi.edu

Upload: junior-dorsey

Post on 17-Dec-2015

228 views

Category:

Documents


0 download

TRANSCRIPT

Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu

Integrating Existing Scientific Workflow Systems:

The Kepler/Pegasus Example

Nandita Mangal, Ewa Deelman, Gaurang Mehta, Mei-Hui Su, Karan VahiUSC Information Sciences Institute

Marina del Rey, CA

pegasus.isi.edu

Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu

Motivation Many workflow systems exists today The choice of particular system often dictated

by who you know Various workflow system have different

capabilities Application components versus services Visual vs. scripting workflow descriptions Performance optimization, etc.

Can you combine two separate systems? What are the issues?

Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu

Kepler (UCSD and UCDavis)

Scientific workflow management system based on Ptolemy II

Allows scientists to visually design and execute scientific workflows

Actor-oriented model with directors acting as the main workflow engine

Enables different models of computation.

Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu

Pegasus (USC/ISI)

Based on programming language principles Leverages abstraction for workflow description to obtain

ease of use, scalability, and portability Provides a compiler to map from high-level descriptions to

executable workflows Correct mapping Performance enhanced mapping

Relies on a runtime engine to carry out the instructions Scalable manner Reliable manner

Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu

Combing Kepler & Pegasus

Integration of Kepler visual programming environment with the grid mapping abilities of Pegasus

Giving Kepler users the ability to map their large workflows onto the grid

Giving Pegasus users a visual workflow composition tool

Differences in the level of abstraction of workflow description

Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu

Kepler Provenance Challenge Workflow

Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu

Concrete Workflow Generation and Mapping

Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu

Implementation Strategy Develop Pegasus-specific entities

Abstract jobs Directors and actors

“Pegasus Director” and “Pegasus Jobs (Actor Entities)” act as the main grid components to execute a given grid

computation Focus mainly on abstract jobs in the Kepler

environment portable and resources-knowledge independent workflow

descriptions

Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu

Kepler

Pegasus DAGMan

Abstract Workflow in DAX

format

ExecutableWorkflow

TasksDistributed Environment

Monitoring Information

Transformation Catalog

Data Registry

Integration

Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu

Pegasus Actor & Director Entities

Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu

Visual Abstract Workflow Creation

Users can create visual models of abstract workflows and specify logical transformations without specifying grid resources

Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu

Job Abstract Configuration--Integration with the Transformation Catalog

Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu

Resultant Abstract Job on Kepler Canvas:

A Pegasus abstract job can take in multiple input files as can output multiple output file

Grid resources information is not expected in such an actor.

Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu

Support for Concrete Jobs--- useful for monitoring and debugging

A concrete job requires specific grid resources information from the scientist.

Allows the scientist to directly execute jobs on the grid

Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu

Pegasus Director/ DAX Generator

Controls the execution of all the job (actor) entities and creates a resulting directed acyclic graph in XML format

Generates a DAX Gives it to DAGMan for execution

Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu

Sample DAX Generated :

<?xml version="1.0" encoding="UTF-8"?> <!-- generated: 2006-12-03T19:27:27-08:00 --> <!-- generated by: Nandita [??] --> <adag xsi:schemaLocation="http://www.griphyn.org/chimera/DAX http://www.griphyn.org/chimera/dax-1.10.xsd" xmlns="http://www.griphyn.org/chimera/DAX" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" version="1.10" count="1" index="0" name="WorkflowTemplate_1_1156808715390"> <!-- part 1: list of all referenced files (may be empty) --> <!-- part 2: definition of all jobs (at least one) --> <job id="Job1_id" namespace="keplerdax" name="Job1" version="1"> <argument><filename file="FileA.png"/> <filename file="FileB.txt"/></argument> <uses file="FileA.png" link="input"/> <uses file="FileB.txt" link="output"/> <uses file="FileC.txt" link="output"/> </job> <job id="Job2_id" namespace="keplerdax" name="Job2" version="1"> <argument><filename file="FileB.txt"/> <filename file="FileE.png"/></argument> <uses file="FileB.txt" link="input"/> <uses file="FileE.png" link="output"/> </job>

Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu

<job id="Job3_id" namespace="keplerdax" name="Job3" version="1"> <argument><filename file="FileC.txt"/><filename file="FileD.xml"/></argument> <uses file="FileC.txt" link="input"/> <uses file="FileD.xml" link="output"/> </job> <job id="Job4_id" namespace="keplerdax" name="Job4" version="1"> <argument><filename file="FileE.png"/> <filename file="FileF.png"/></argument> <uses file="FileE.png" link="input"/> <uses file="FileD.xml" link="input"/> <uses file="FileF.png" link="output"/> </job> <!-- part 3: list of control-flow dependencies (may be empty) --> <child ref="Job2_id"> <parent ref="Job1_id"/> </child> <child ref="Job3_id"> <parent ref="Job1_id"/> </child> <child ref="Job4_id"> <parent ref="Job2_id"/> </child> <child ref="Job4_id"> <parent ref="Job3_id"/> </child> </adag>

Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu

Provenance Challenge Workflow in Kepler/Pegasus

In Kepler each node needs a unique name, so TC needs many duplicate entries

Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu

Integration Benefit for Pegasus users

Visualizing/ Debugging Existing Models: Support a scientist trying to redo/visualize or easily

re-configure existing DAX

Provide option to upload existing DAX files into the workspace

Convert the specified DAX file into a MoML (Kepler’s) format by passing it through an XSLT processor and generating the required directors and actors on the canvas

Issues of scalability (only small workflows can be visualized) Scoping may need to be applied

Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu

Integration Issues

Kepler acts a visual programming environment Actors represent single units of computation with data flow

among each other Some configuration not intuitive (TC entries)

There is no concept of representation of files separately in Kepler Have multiport I/O ports for each job The user is given the option to connect as many files going into and

coming out of the port

Potential use of integrated environment for debugging

Not done Integration with Pegasus data registry No monitoring of execution in Kepler Use of Kepler’s workflow execution engine Support for Kepler actors in Pegasus

Ewa Deelman, [email protected] www.isi.edu/~deelman pegasus.isi.edu

Relevant Links

Kepler: http://kepler-project.org Pegasus: http://pegasus.isi.edu DAGMan: www.cs.wisc.edu/condor/dagman/ Provenance challenge:

http://twiki.ipaw.info/bin/view/Challenge/ Workshop on Tuesday

NSF workshop on Challenges of Scientific Workflows: www.isi.edu/nsf-workflows06