accessing grid resources via portals and workflow tools accessing grid resources via portals and...

30
Accessing Grid Resources via Portals and Accessing Grid Resources via Portals and Workflow Tools Workflow Tools Sriram Krishnan, Ph.D. [email protected]

Upload: molly-goodwin

Post on 13-Jan-2016

231 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

Accessing Grid Resources via Portals and Workflow ToolsAccessing Grid Resources via Portals and Workflow Tools

Sriram Krishnan, [email protected]

Page 2: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

Condor pool SGE Cluster PBS Cluster

Globus Globus Globus

Application Services Security Services (GAMA)

StateMgmt

Gemstone

PMV/Vision Kepler

NBCR GridNBCR Grid

Page 3: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

User Interfaces: GemstoneUser Interfaces: Gemstone

Page 4: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

User Interfaces: AutoDockTools (ADT), PMVUser Interfaces: AutoDockTools (ADT), PMV

Page 5: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

User Interfaces: What is a Portal?User Interfaces: What is a Portal?

• “A portal is a web based application that commonly provides personalization, single sign on, content aggregation from different sources and hosts the presentation layer of Information Systems”(JSR 168)

• Grid/Science Portals build upon the familiar Web portal model, such as Yahoo or Amazon, to deliver the benefits of Grid computing to virtual communities of users, providing a single access point to Grid services and resources.

Page 6: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

User Interfaces: PortalsUser Interfaces: Portals

• Pros– Ubiquitous access to applications– No need to install complex software

• Cons– Limited interaction with local desktop tools– Interfaces may not be rich enough for complex tasks

such as visualization– Not very easy to make highly interactive interfaces

Page 7: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

User Interfaces: The CAMERA Labs PortalUser Interfaces: The CAMERA Labs Portal

Page 8: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

CAMERA Labs DemoCAMERA Labs Demo

Page 9: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

Portal TechnologyPortal Technology

• Built on top of the GridSphere Portal Framework– http://www.gridsphere.org

• JSR 168 Portlet API compliant– Similar to Servlet API in providing reusable

Web applications – Ratified in August 2003 by vendors including

BEA, Sun, IBM, Oracle, Plumtree, etc

Page 10: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

What is a Portlet?What is a Portlet?

• Standardized packaging model to share portlet applications among portal vendors

• Builds off Servlet API and spec. so no major surprises for existing Java portal developers

• Supports window states and mode settings like desktop environment• API provides useful methods for storing per user data and configuration

settings

Page 11: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

What makes GridSphere different?What makes GridSphere different?

• Already many other OS portals out there:– Jetspeed2, uPortal, StringBeans, Exo, Liferay, JBoss

• A handy template build system using Apache Ant:– ant new-project

• Lightweight: no EJB, based on popular, robust libraries– e.g. Hibernate for persistence

• Visual UI tags and beans makes presentation development much easier

• Support for the Grid!! – GridPortlets offered as add-on webapp– Provides Library and collection of portlets for:

• Credential support, job launch (GRAM), data transfer (GridFTP)

• Used by several CyberInfrastructure projects like BIRN, NBCR, GEON, CAMERA– Lots of reusable software!

Page 12: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

Advanced Usage: Workflows Advanced Usage: Workflows

• Need for automation of processes (scientific or otherwise)– An end-to-end application is typically more

than a single application run– Must be reproducible and maintainable– Should be easy to compose from individual

components

Page 13: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

clienttravel agent

airline A airline B

bank/CC

delivery

buy a ticket

tickets

arrive

confirm

Workflow Scenario: BusinessWorkflow Scenario: Business

Page 14: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

Scientific Workflows: Phylogeny AnalysisScientific Workflows: Phylogeny Analysis

Local Disk

MultipleSequenceAlignment

PhylogenyAnalysis

TreeVisualization

Page 15: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

Scientific Workflow SystemsScientific Workflow Systems

• Combination of – data integration, analysis, and visualization steps – larger, automated "scientific process"

• Mission of scientific workflow systems– Promote “scientific discovery” by providing tools and methods to

generate scientific workflows– Create an extensible and customizable graphical user interface for

scientists from different scientific domains– Support computational experiment creation, execution, sharing,

reuse and provenance– Design frameworks which define efficient ways to connect to the

existing data and integrate heterogeneous data from multiple resources

Page 16: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

Why not just a Python script?Why not just a Python script?

• End-users who define, reuse, modify, and specialize workflows would find visual interfaces much easier than scripts– Typically also possible to compile scripts from designed

workflows

• Other advantages:– Modular reuse, application interoperability– Debugging and monitoring– Automated data management (e.g. provenance)– Validation (e.g. data, structural, semantic typing)

• From integrated modeling to execution, optimization, and archival

Page 17: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

Ptolemy II: A laboratory for investigating design

KEPLER: A problem-solving environment for Scientific Workflow

KEPLER = “Ptolemy II + X” for Scientific Workflows

Kepler: A Scientific Workflow SystemKepler: A Scientific Workflow System

• 1st Beta release (June 2, 2006)

www.kepler-project.orgwww.kepler-project.org

• Builds upon the open-source Ptolemy II framework

Page 18: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

Actor-Oriented DesignActor-Oriented Design

• Actor– Encapsulation of parameterized

actions – Interface defined by ports and

parameters

• Port– Communication between input and

output data– Without call-return semantics

• Model of computation– Communication semantics among

ports – Flow of control– Implementation is a framework

Actors: Processing Components

Page 19: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

Available ActorsAvailable Actors

• Generic Web Service Client and Web Service Harvester• Customizable RDBMS query and update• Command-line wrapper tools (local, ssh, scp, ftp, etc.)• Some Grid actors

– Globus Job runner, GridFTP-based file access, Proxy Certificate Generator

• SRB support• Imaging, Visualization Support• Textual and Graphical Output• Some domain-specific actors for Geosciences and Bio-

informatics

Page 20: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

Directors: Definition of Workflow SemanticsDirectors: Definition of Workflow Semantics

• Implement different computational models• Define the semantics of

– execution of actors and workflows– interactions between actors

• Kepler is extending Ptolemy directors with specialized ones for Web service based workflows, and distributed workflows

• Process Networks• Rendezvous• Publish and Subscribe• Continuous Time• Finite State Machines

• Dataflow• Time Triggered• Synchronous/reactive model• Discrete Event• Wireless

Page 21: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

Dataflow as a Computation ModelDataflow as a Computation Model

• Dataflow: Abstract representation of how data flows in the system

• A dataflow program: a graph– Nodes represent operations, edges represent data paths

• Sound, simple, powerful model of parallel computation– NOT having a locus of control makes it simple!– Naturally distributed model of computation:

– Asynchronous: Many actors can be ready to fire simultaneously– Execution ("firing") of a node starts when (matching) data is available at a node's input ports.

– Locally controlled events– Events correspond to the “firing” of an actor

– Actor:– A single instruction– A sequence of instructions

– Actors fire when all the inputs are available

Page 22: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

Vergil is the GUI for KeplerVergil is the GUI for Kepler

• Actor ontology and semantic search for actors• Search -> Drag and drop -> Link via ports• Metadata-based search for datasets

Actor Search Data Search

Page 23: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

Actor SearchActor Search

• Kepler Actor Ontology• Used in searching actors and creating conceptual views (= folders)

Currently more than 200 Kepler actors added!

Page 24: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

Kepler Provenance Framework Kepler Provenance Framework

• OPTIONAL!– Modeled as a separate concern in the system – Listens to the execution and saves information customized by a set of

parameters• Context: who, what, where, when, and why that is associated with the run• Input data and its associated metadata• Workflow outputs and intermediate data products• Workflow definition (entities, parameters, connections): a specification of what

exists in the workflow and can have a context of its own • Information about the workflow evolution -- workflow trail

• Types of Provenance Information:– Data provenance

• Intermediate and end results including files and db references– Process provenance

• Keep the workflow definition with data and parameters used in the run– Error and execution logs– Workflow design provenance

Page 25: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

Kepler Provenance Recording UtilityKepler Provenance Recording Utility

• Parametric and customizable – Different report formats– Variable levels of detail

• Verbose-all, verbose-some, medium, on error

– Multiple cache destinations

• Saves information on– User name, Date, Run, etc…

Page 26: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

Kepler Basics: Hello World DemoKepler Basics: Hello World Demo

Page 27: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

Advanced Kepler: MEME-MAST WorkflowAdvanced Kepler: MEME-MAST Workflow

Page 28: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

Advantages of Scientific Workflow SystemsAdvantages of Scientific Workflow Systems

• Formalization of the scientific process• Easy to share, adapt and reuse

– Deployable, customizable, extensible

• Management of complexity and usability– Support for hierarchical composition– Interfaces to different technologies from a unified interface– Can be annotated with domain-knowledge

• Tracking provenance of the data and processes– Keep the association of results to processes– Make it easier to validate/regenerate results and processes– Enable comparison between different workflow versions

• Execution monitoring and fault tolerance• Interaction with multiple tools and resources at once

Page 29: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

SummarySummary

• Presented access to Grid applications via Portals and Workflow tools

• References– PMV, ADT: http://mgltools.scripps.edu/– CAMERA: http://camera.calit2.net– GridSphere: http://www.gridsphere.org– Kepler: http://www.kepler-project.org

Page 30: Accessing Grid Resources via Portals and Workflow Tools Accessing Grid Resources via Portals and Workflow Tools Sriram Krishnan, Ph.D. sriram@sdsc.edu

AcknowledgementsAcknowledgements

• CAMERA labs portal built in conjunction with the rest of the CAMERA team

• Several slides borrowed from Kepler tutorials presented by Ilkay Altintas [[email protected]]