workflows, taverna, and teragrid

Download Workflows, Taverna, and Teragrid

Post on 15-Jan-2016




0 download

Embed Size (px)


Workflows, Taverna, and Teragrid. Annual Meeting, 2008. Agenda. Kiran Keshav Workflow Working Group Ravi Madduri Taverna and Workflows Christine Hung TeraGrid. Background - What Does This Look Like?. Data Service A. Analytical Service B. - PowerPoint PPT Presentation


  • Workflows, Taverna, and TeragridAnnual Meeting, 2008

  • AgendaKiran KeshavWorkflow Working GroupRavi MadduriTaverna and WorkflowsChristine HungTeraGrid

  • Background - What Does This Look Like?This includes run A, get results from A, input results from A into B, get results from B, etc.

  • Atomic UnitsIn some cases, you may want to run a series of tasks as one (atomic) unit.Reasons:You need to run the same tasks in the same order multiple times.Each individual task is long running, and you dont want to wait for one task to finish before you start the next.Tasks only make sense in the context of other tasks.Enter workflows:A sequence of operations executed as a single unit of work.The sequence of events can have logic:Aresult + Bresult C

  • Example Scenario

  • What Are We Trying to Accomplish?Execute grid analytical and data services to accomplish a pre-defined task.

    Authoring of workflows should be easy.

    Submission of workflows should be easy.

  • Authoring and Executing WorkflowsHow do you write a workflow?Authoring toolWho executes the workflow?Workflow engine

  • Authoring ToolFind caBIG servicesAdd services of interest to the workflowSubmit the workflow to the workflow engine

  • Workflow EngineExecutes the workflowCan apply logic to the workflowGet the results from Service A, add these to the results from Service B, then divide by 2.Take this new result pass this as input to Service C.Store intermediate data (radiology)

  • Workflows in 2007Identify a workflow authoring tool TavernaImplement a scientific workflowIdentify significant barriers to workflow interoperability

  • Workflows in 2008Use case based:Identify services and corresponding domains of caGrid data and analytical servicesImplement 2-3 workflowsWorkflow best practicesTooling enhancementsTaverna enhancementsCompatibility review enhancements, etc.

  • AdministrativeWorkflow Working GroupKiran Keshavkeshav@c2b2.columbia.edu1st and 3rd Monday of every month at 11am EST

  • Taverna OverviewState of Workflow in caGrid 1.0BPELArchitecture overview of Workflow servicesExperienceICR Working GroupWorkflows construction should be easyInvestigation and choosing TavernaTavernaOverviewEarly workAdvantages Semantic ScavengerFuture work

  • caGrid background Where does workflow fit in?Orchestrates caGrid-build servicescaGrid security supportedcaGrid clients build/run workflowsIntegrates Semantically annotated data

  • Workflow in caGridHow does caGrid implement workflow?Workflow Factor Service (WFS)Grid service to create a new workflow Workflow ServiceGrid service to access your created workflowStart, pause, resume, cancel, getWorkflowOutputcaGrid integrationInvoke grid servicesSecurity (communication, message, conversation)caGrid implementationLeverages the ActiveBPEL workflow engineWorkflows exposed as web services in ActiveBPEL, wrapped as grid services Wraps the ActiveBPEL Admin ServiceWFS submits a BPR (workflow package)Accesses the created stateful web service

  • Workflow backgroundBPEL example

  • Investigation Conducted extensive tool review of existing tools: Prioritized feature list:

  • Results: Authoring Tool - TavernaTaverna: Ravi Madduri (Cagrid workflow point of contact) attended Taverna workshopPrototype of Taverna to discover and invoke caGrid services

  • Tavernas FeaturesExplicit modeling of data flow. Implicit iteration.Input/Output metadata. Taverna provides a plug-in called Feta, a semantic discovery tool, to support service discovery using the function description as well as input and output metadata, semantically described using the myGrid ontology. This metadata mechanism provides a feasible integration point with caGrid metadata and discovery services.

  • Tavernas Features (Cont..)Beanshell scripting and XML processing support. A custom processor that executes a beanshell (a flavor of dynamic Java) script can be defined inside a Taverna workflow. Beanshell scripts can be shared with others by wrapping them in a workflow and publishing the workflow on the Web. Taverna also provides a set of local processors supporting the processing of XML. By this means Taverna can leverage the processing power of Java and XML.Taverna is a nice tool which is easy to use for the non-IT specialists, and the Taverna workbench provides both build-time and run-time support for workflow modeling, debugging, tracing and provenance. This feature of Taverna is an obvious advantage over other solutions like BPEL, DAGMan, etc, because these languages and tools are designed for IT specialists and the domain users in caGrid have difficulties in using them.

  • GT4 Plug-in for TavernaProcessor: The concept of processor in a Taverna workflow is the same as the concept of task in a common workflow. ProcessorTaskWorker: Each processor interface corresponds to a ProcessorTaskWorker interface. ProcessorTaskWorker interface defines the action that is to be performed at runtime when workflow execution reaches to that processor.Scavenger: In Taverna, a scavenger is a logic collection of a set of processors. For example, a WSDL scavenger collects all the processors each of which corresponds to an operation defined inside that WSDL.

    Taverna Workbench

    GT4-Processor Plug-in




    1. Get service list

    2. Get operation (processor)

    3. Invoke processor

  • GT4 Plug-in for Taverna (contd)GT4 ScavengerA sample caGrid workflow

  • GT4 Plug-in for Taverna (contd)GT4 Scavenger with semantic/metadata based caGrid service queryA sample caGrid workflow

  • GT4 Plug-in for Taverna (contd)Execution trace of the sample caGrid workflow

  • Demo

  • Future WorkEnable Taverna to invoke Grid Services that use Resource patternEnable Taverna to invoke Secure Grid ServicesRemote Execution of WorkflowsIntegration of Taverna execution with caGrid workflow services

  • TeragridOverview (5 min)Introduction on TeraGrid Workgroup Background on geWorkbench and geWorkbench/caGrid/TeraGrid ProjectTechnology (10 min)Steps to establishing geWorkbench/caGrid/TeraGrid InterfaceUse of caGrid Security (GTS, Grid Grouper, Dorian, CDS)Workflow and communications between servicesDemo (5 min)Discussion (5 min)

  • Team MembersgeWorkbench (Columbia University)Christine HungKiran KeshavcaGrid (Ohio State University)Scott OsterStephen LangellacaGrid/TeraGrid (Argonne National Laboratory)Ravi MadduriTeraGrid (Argonne National Laboratory)Stuart MartinTeraGrid (Texas Advanced Computing Center)Stephen MockManagementAris Floratos (Columbia University)Krishnakant Shanbhag (Argonne National Laboratory)Michael Keller (Booz Allen Hamilton)Patrick McConnell (Duke University)Nancy Wilkins-Diehr (San Diego Supercomputer Center)

  • OverviewPrimary problem to addressLack of infrastructure and operating procedures to support high performance computing needs of caBIGOverarching goalsRegular caGrid services will run as caGrid/TeraGrid gateways services Virtualize TeraGrid resources (both compute and storage)Approach: labor divided between domain and technical tasksUse cases will be drafted to identify the needs of the communityExisting TeraGrid Gateway projects will be surveyed to identify lessons learned and potential technology for reuseDemonstrate approach through working prototypeDocument best practices and develop cookbook

  • TeraGrid OverviewCharacteristics:> 250 teraflops of computing capability >30 petabytes of online and archival data storagehigh-performance networksMechanics:Prospective users request allocation of HPC resources to a review committeeAllocations are granted, and credentials are issuedJobs are run with credentials and resource usage is billed to the allocationTeraGrid is an open scientific discovery infrastructure combining leadership class resources at nine partner sites to create an integrated, persistent computational resource.

  • caGrid Gateway Service OverviewcaGrid service running in the caBIG environment which acts as a bridge or proxy to TeraGrid resources for a subset of caBIG usersshould meet Gold compatibility requirementsCreated for a specific scientific scenario:abstracts away the details of leveraging TeraGrid for performance intensive operationsuses domain-specific operations and data typeshas access to TeraGrid allocationAlleviates the need for caBIG users to:understand the complexities of TeraGrid (or HPC systems)obtain TeraGrid accounts/allocations

  • geWorkbench a Platform for Integrated GenomicsIntegrated genomics analysis applicationSupport for gene expression data, sequences, pathways, and structure50+ visualization and analysis modulesAccess to local and remote data sources and analytical servicesIntegration with biological annotation sourcesDevelopment PlatformOpen sourceJava basedComponent architectureFacilitates customization

  • geWorkbench a Platform for Integrated GenomicsLarge collection of components

    Data parsers: Affy MAS/GCOS (txt and CEL), Genepix, RMA, FASTA, caArray, PDB.

    Data Management: Project folders, marker/sequence/array groups.

    Visualization: Dendrograms, color mosaics, scatter plots, SOM clusters, BLAST results, dot matrices.

    Analyses: Hierarchical clustering, t-test, SVM, ARACNE, MEDUSA. MatrixREDUCE