programming scientific and distributed workflow with triana services matthew shields, ggf10 workflow...
Post on 23-Dec-2015
219 Views
Preview:
TRANSCRIPT
Programming Scientific and Distributed Workflow with Triana Services
Matthew Shields, GGF10 Workflow Workshop, 9th March
Matthew Shields, Cardiff University
Presentation Outline
TrianaOverviewTriana services and their distribution Distribution policiesThe GAP interface and its relation to the Gridlab GAT
Scientific WorkflowBinary Inspiral Algorithm Example
Dynamic Distributed WorkflowService Composition on the GridService Usage, dynamically distributing a Triana workflow
Conclusion
Matthew Shields, Cardiff University
GAP Any
GAP service
e.g. Web service
Triana Distributed Work-flow
Network
ActionCommand
s
Workflow, e.g. BPEL4WS
TrianaEngine
Triana ControllingService (TCS)
TrianaService &
Engine
TrianaService &
Engine
OtherEngine
Distributed Triana Work-flow- flexible distribution: based around Triana Groups- HPC and Pipelined distribution
Triana Gateway
Matthew Shields, Cardiff University
GAP Overview based around a series of Java interface classesConcrete implementations that form the GAP bindings The core interface is the
Service Creation and DiscoveryPipe Creation and DiscoveryMessage CommunicationInformationJob SubmmissionData Management - transfers - logical lookup
Will be become an adapter for the GridLab Java GAT, providing:
Advertisement, Discovery, deployment and communication of servicesGRMS job submission adapterData Management Services
Matthew Shields, Cardiff University
Jxtaserve GSI EnabledNS-2And more..
Java GAT Prototype
Jxta
GridLab GAT (www.gridlab.org)
• Advertising• Discovery• Communication
GAP (Java Prototype)
Web Services
P2PS
Job Submission (GRMS)
• Generic Job Submission• Virtual filename data accessData
Management
• Set of generic Java interfaces• high level abstractions to Grid services• Factory design – dynamic pluggable services
OGSA(planned)
Matthew Shields, Cardiff University
Triana Prototype
Distributed Triana Prototype Based around Triana Groups i.e. aggregate toolsEach group can be distributedDistribution policies:
HTC - high throughput/task farmingPipeline - allow node to node communication
Each service can be a gateway to finer granularities of distribution:
PipelineDistribution
Task-FarmingDistribution
Triana Service
Triana Service
Triana Service
Triana Service
Triana Service
Triana Service
Matthew Shields, Cardiff University
Triana Workflow
Triana is inherently flow basedData flow - data arriving at component triggers executionControl flow - control commands trigger execution
Decentralised executionData or Control messages sent along communication “pipes” from sender to receiver causes receiver to executeSynchronous or Asynchronous messaging (Implementation dependant)Multiple inputs can block or trigger immediately (Component designer defined)
Matthew Shields, Cardiff University
Components and Definitions
Component is unit of executionComponents are defined in XML files:
Naming informationInput and output portsParameter information
Why Components?To simplify the application design process and to speed up application development
The component model provides an infrastructure for the interaction of components
Matthew Shields, Cardiff University
Taskgraph
Internal object based workflow graph representationTaskgraph - DAGTasksConnections
External XML representationSimple XML syntaxList of participating Task definitionsParent/Child connectionHierarchical (Compound components)
Alternative Languages & Syntaxe.g. BPEL4WSAvailable through pluggable readers & writers.
Matthew Shields, Cardiff University
Workflow
No explicit language support for control constructsLoops and execution branching handled by components
Loop component - controls loop over sub-workflowLogical component - control workflow branching
Unlike BPEL4WS or similarFlexibility of control - constraint based loops etc…
Matthew Shields, Cardiff University
Distributing Triana Workflow
Deploying Remote Services on ResourcesService application installation Service executionService discovery
Mapping tasks or groups of tasks to Services
Workflow rewiring, XML definition for connections modified for remote location - sub-workflows duplicatedData distribution, annotated sub-sections of taskgraph passed to resources
Matthew Shields, Cardiff University
GEO 600 Inspiral Search
BackgroundCompact binary stars orbiting each other in a close orbit
among the most powerful sources of gravitational waves
As the orbital radius decreases a characteristic chirp waveform is produced - amplitude and frequency increase with time until eventually the two bodies merge together
Computing Need 10 Gigaflops to keep up with real time data (modest search..)
Data 8kHz in 24-bit resolution (stored in 4 bytes) -> Signal contained within 1 kHz = 2000 samples/seconddivided into chunks of 15 minutes in duration (i.e. 900 seconds) = 8MB
Algorithm Data is transmitted to a nodeNode initialises i.e. generates its templates (around 10000)fast correlates its templates with data
Matthew Shields, Cardiff University
Coalescing Binary Search
GEO 600 Coalescing
Binary Search Algorithm
implemented as a Triana workflow
Matthew Shields, Cardiff University
Coalescing Binary Scenario
GridlabTest-bed
GW Data
Distributed Storage
Logical File Name
CB Search
Controller
GAT (GRMS, Adaptive)
GW Data
GAT (Data Management)
• Submit Job• Optimised Mapping
Email, SMS notification
Matthew Shields, Cardiff University
GRMS Web Service
rage1.man.poznan.pl
GridlabTestbed
GAP
Triana Service Job Submission
Matthew Shields, Cardiff University
Triana GRMS Component
Front end to GridLab GRMS Web ServiceJob Submission Service - interfaces with GRAM
GAP Web Service binding + GSI AuthenticationJava CoG Kit
X509 Certificate handlingAxis authentication & communication
GRMS executes applications on GridLab Testbed
Heterogeneous hardware platforms Default software - Globus 2.4, GSISSH, cc, cvs, c++, F90, make, perl, mpicc
Matthew Shields, Cardiff University
Service Composition Workflow
Multiple GRMS Components
Install Applications (ftp, tar, ant)
Start installed Triana Services
Matthew Shields, Cardiff University
Dynamic Distributed Workflow
Distribution units are standard Triana tools, enabling users to create their own custom distributions
DistributionUnit
Wave Grapher
GaussianFFT
GaussianFFT
RemoteServices
LocalTriana
The workflow is cloned/split/rewired to achieve the required distribution topology
Custom distribution units allow sub-workflows to be distributed in parallel or pipelined
Matthew Shields, Cardiff University
Conclusion
GridlabTest-bed
GW Data
Distributed Storage
Logical File Name
CB Search
Controller
GAT (GRMS, Adaptive)
GW Data
GAT (Data Management)
• Submit Job• Optimised Mapping
Email, SMS notification
Matthew Shields, Cardiff University
Conclusion
Shown three distinct workflowsService composition workflow to submit grid jobs that deploys multiple Triana Services on remote resourcesLocal scientific workflow representing the algorithmDynamic distributed workflow - rewire local workflow for data parallelism across multiple Triana Services
GAP APIWeb Service binding + GSI - Grid Job SubmissionP2PS binding - service discovery + service communication
Combined to perform parallel scientific computation
Matthew Shields, Cardiff University
Thanks !
• The Astronomers: Prof. B Sathyaprakash, David Churches, Roger Philp and Craig Robinson
• The Triana team: Ian Wang, Andrew
Harrison, Omer Rana, Diem Lam and Shalil Majithia
• All the partners in the GridLab project
top related