geoffrey fox community grids lab indiana university gcf@indiana

42
Grids e- Science CyberInfrastruct ure and Peer-to-Peer Networks Los Alamos September 23 2003 Geoffrey Fox Community Grids Lab Indiana University [email protected]

Upload: hila

Post on 26-Jan-2016

14 views

Category:

Documents


0 download

DESCRIPTION

Remarks on Grids e-Science CyberInfrastructure and Peer-to-Peer Networks Los Alamos September 23 2003. Geoffrey Fox Community Grids Lab Indiana University [email protected]. What is High Performance Computer?. We might wish to consider three classes of multi-node computers - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Remarks onGrids e-Science

CyberInfrastructureand Peer-to-Peer

NetworksLos Alamos

September 23 2003

Geoffrey FoxCommunity Grids Lab

Indiana [email protected]

Page 2: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

What is High Performance Computer?• We might wish to consider three classes of multi-node computers• 1) Classic MPP with microsecond latency and scalable internode

bandwidth (tcomm/tcalc ~ 10 or so)• 2) Classic Cluster which can vary from configurations like 1) to

3) but typically have millisecond latency and modest bandwidth• 3) Classic Grid or distributed systems of computers around the

network– Latencies of inter-node communication – 100’s of milliseconds but can have

good bandwidth• All have same peak CPU performance but synchronization costs

increase as one goes from 1) to 3)• Cost of system (dollars per gigaflop) decreases by factors of 2 at

each step from 1) to 2) to 3)• One should NOT use classic MPP if class 2) or 3) suffices unless

some security or data issues dominates over cost-performance• One should not use a Grid as a true parallel computer – it can link

parallel computers together for convenient access etc.

Page 3: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

What is a Grid I?• Collaborative Environment (Ch2.2,18)• Combining powerful resources, federated computing and a security

structure (Ch38.2)• Coordinated resource sharing and problem solving in dynamic multi-

institutional virtual organizations (Ch6)• Data Grids as Managed Distributed Systems for Global Virtual

Organizations (Ch39)• Distributed Computing or distributed systems (Ch2.2,10)• Enabling Scalable Virtual Organizations (Ch6)• Enabling use of enterprise-wide systems, and someday nationwide

systems, that consist of workstations, vector supercomputers, and parallel supercomputers connected by local and wide area networks. Users will be presented the illusion of a single, very powerful computer, rather than a collection of disparate machines. The system will schedule application components on processors, manage data transfer, and provide communication and synchronization in such a manner as to dramatically improve application performance. Further, boundaries between computers will be invisible, as will the location of data and the failure of processors. (Ch10)

Page 4: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

What is a Grid II?• Supporting e-Science representing increasing global collaborations of

people and of shared resources that will be needed to solve the new problems of Science and Engineering (Ch36)

• As infrastructure that will provide us with the ability to dynamically link together resources as an ensemble to support the execution of large-scale, resource-intensive, and distributed applications. (Ch1)

• Makes high-performance computers superfluous (Ch6)• Metasystems or metacomputing systems (Ch10,37)• Middleware as the services needed to support a common set of

applications in a distributed network environment (Ch6)• Next Generation Internet (Ch6)• Peer-to-peer Network (Ch10, 18)• Realizing thirty year dream of science fiction writers that have spun

yarns featuring worldwide networks of interconnected computers that behave as a single entity. (Ch10)

• Technology on which to build CyberInfrastructure (NSF)• High Performance Computing World’s view of the Web

The Grid for my purposes is “best practice” in all of this!

Page 5: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Taxonomy of Grid FunctionalitiesName of Grid Type Description of Grid Functionality

Compute/File Grid

or Data File Grid

Run multiple jobs with distributed compute and data resources (Global “UNIX Shell”)

Desktop Grid

e.g. SETI@Home

“Internet Computing” and “Cycle Scavenging” with secure sandbox on large numbers of untrusted computers

Information Grid

or Data Service Grid

Grid service access to distributed information, data and

knowledge repositories

Complexity or Hybrid Grid

Hybrid combination of Information and Compute/File Grid emphasizing integration of experimental data, filters and simulations: Data assimilation

Campus Grid Grid supporting University community computing

Enterprise Grid Grid supporting a company’s enterprise infrastructure

Page 6: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Classes of Computing Grid Applications• Running “Pleasing Parallel Jobs” as in United Devices,

Entropia (Desktop Grid) “cycle stealing systems”• Can be managed (“inside” the enterprise as in Condor) or

more informal (as in SETI@Home)• Computing-on-demand in Industry where jobs spawned are

perhaps very large (SAP, Oracle …)• Support distributed file systems as in Legion (Avaki),

Globus with (web-enhanced) UNIX programming paradigm– Particle Physics will run some 30,000 simultaneous jobs this way

• Pipelined applications linking data/instruments, compute, visualization

• Seamless Access where Grid portals allow one to choose one of multiple resources with a common interfaces

Page 7: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Information/Knowledge Grids• These are typified by virtual observatory and

bioinformatics applications• Distributed (10’s to 1000’s) of data sources (instruments,

file systems, curated databases …)• Possible filters assigned dynamically

– Run image processing algorithm on telescope image– Run Gene sequencing algorithm on data from EBI/NCBI

• Integrate across experiments as in multi-wavelength astronomy

• Needs decision support front end with “what-if” simulations

• Metadata (provenance) critical to annotate data• SERVOGrid – Solid Earth Research Virtual Observatory

will link Japan, Australia, USA

Page 8: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Database Database

Closely Coupled Compute Nodes

Analysis and Visualization

RepositoriesFederated Databases

Sensor NetsStreaming Data

Loosely Coupled Filters

SERVOGrid Caricature

Page 9: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Sources of Grid Technology• Grids support distributed collaboratories or virtual

organizations integrating concepts from• The Web• Agents• Distributed Objects (CORBA Java/Jini COM)• Globus, Legion, Condor, NetSolve, Ninf and other High

Performance Computing activities• Peer-to-peer Networks• With perhaps the Web and P2P networks being the most

important for “Information Grids” and Globus for “Compute Grids”

Page 10: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

The Essence of Grid Technology?• We will start from the Web view and assert that basic

paradigm is• Meta-data rich Web Services communicating via

messages• These have some basic support from some runtime such

as .NET, Jini (pure Java), Apache Tomcat+Axis (Web Service toolkit), Enterprise JavaBeans, WebSphere (IBM) or GT3 (Globus Toolkit 3)– These are the distributed equivalent of operating system

functions as in UNIX Shell

– Called Hosting Environment or platform

• W3C standard WSDL defines IDL (Interface standard) for Web Services

Page 11: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Services and Distributed Objects• A web service is a computer program running on either the local or

remote machine with a set of well defined interfaces (ports) specified in XML (WSDL)

• Web Services (WS) have many similarities with Distributed Object (DO) technology but there are some (important) technical and religious points– CORBA Java COM are typical DO technologies– Agents are typically SOA (Service Oriented Architecture)

• Both involve distributed entities but Web Services are more loosely coupled– WS interact with messages; DO with RPC– DO have “factories”; WS manage instances internally and interaction-specific

state not exposed and hence need not be managed– DO have explicit state (statefull services); WS use context in the messages to

link interactions (statefull interactions)

• Claim: DO’s do NOT scale; WS build on experience (with CORBA) and do scale

Page 12: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

A typical Web Service• In principle, services can be in any language (Fortran .. Java .. Perl ..

Python) and the interfaces can be method calls, Java RMI Messages, CGI Web invocations, totally compiled away (inlining)

• The simplest implementations involve XML messages (SOAP) and programs written in net friendly languages like Java and Python

Security Catalog

PaymentCredit Card

WarehouseshippingWSDL interfaces

WSDL interfaces

Page 13: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Details of Web Service Protocol Stack• UDDI finds where programs are

– remote( (distributed) programs are just Web Services

– (not a great success)

• WSFL links programs together(under revision as BPEL4WS)

• WSDL defines interface (methods, parameters, data formats)

• SOAP defines structure of message including serialization of information

• HTTP is negotiation/transport protocol

• TCP/IP is layers 3-4 of OSI

• Physical Network is layer 1 of OSI

UDDI or WSILUDDI or WSIL

WSFLWSFL

WSDLWSDL

SOAP or RMISOAP or RMI

HTTP or SMTP or IIOP or

RMTP

HTTP or SMTP or IIOP or

RMTP

TCP/IPTCP/IP

Physical Network

Physical Network

Page 14: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

What are System and Application Services?• There are generic Grid system services: security, collaboration,

persistent storage, universal access– OGSA (Open Grid Service Architecture) is implementing these as

extended Web Services• An Application Web Service is a capability used either by another

service or by a user– It has input and output ports – data is from sensors or other services

• Consider Satellite-based Sensor Operations as a Web Service– Satellite management (with a web front end)– Each tracking station is a service– Image Processing is a pipeline of filters – which can be grouped

into different services– Data storage is an important system service– Big services built hierarchically from “basic” services

• Portals are the user (web browser) interfaces to Web services

Page 15: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Application Web Services• Note Service model integrates sensors, sensor analysis, simulations and people

• An Application Web Service is a capability used either by another service or by a user

– It has input and output ports – data is from users, sensors or other services

– Big services built hierarchically from “basic” services

Sensor Data as a Web

service (WS)

Data Analysis WS

Sensor Management

WS

Visualization WS

Simulation WS

Filter1WS

Filter2WS

Filter3WS

Build as multiple Filter Web Services

Prog1WS

Prog2WS

Build as multiple interdisciplinaryPrograms

Data Analysis WS

Simulation WS

Visualization WS

Page 16: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

What is Happening?• Grid ideas are being developed in (at least) two

communities– Web Service – W3C, OASIS– Grid Forum (High Performance Computing, e-Science)

• Service Standards are being debated• Grid Operational Infrastructure is being deployed• Grid Architecture and core software being developed• Particular System Services are being developed “centrally”

– OGSA framework for this in • Lots of fields are setting domain specific standards and

building domain specific services• There is a lot of hype• Grids are viewed differently in different areas

– Largely “computing-on-demand” in industry (IBM, Oracle, HP, Sun)

– Largely distributed collaboratories in academia

Page 17: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Grid Applications• Cope with Data Deluge – Moore’s law for detectors• Astronomy – virtual observatories• Biology – distributed repositories and filtering• Chemistry – online laboratories• Earth/Environmental Science – distributed sensors• Engineering – distributed monitors• Health – medical instruments and images• Particle Physics – analyze LHC data• Gridsourcing – animation in China, software in India and

design/leadership in USA– Basketball coaching in Indiana, players in China– Teachers in Los Alamos, students in universities

• Command and Control for DoD• Federation of Information systems and modeling and simulation• Problem Solving Environment and Software Integration

Page 18: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

In flight data

Airline

Maintenance Centre

Ground Station

Global NetworkSuch as SITA

Internet, e-mail, pager

Engine Health (Data) Center

DAME

Rolls Royce and UK e-Science ProgramDistributed Aircraft Maintenance Environment

~ Gigabyte per aircraft perEngine per transatlantic flight

~5000 engines

Page 19: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

OGSA OGSI & Hosting Environments• Start with Web Services in a hosting environment

• Add OGSI to get a Grid service and a component model

• Add OGSA to get Interoperable Grid “correcting” differences in base platform and adding key functionalities

OGSI on Web Services

Broadly applicable services: registry,authorization, monitoring, data

access, etc., etc.

Hosting Environment for WS

More specialized services: datareplication, workflow, etc., etc.

Domain -specific services

Network

OGSAEnvironment

Possibly OGSA

Not OGSA

Given to us from on high

Page 20: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

OGSI Open Grid Service Interface• http://www.gridforum.org/ogsi-wg• It is a “component model” for web services.• It defines a set of behavior patterns that each OGSI service must exhibit.• Every “Grid Service” portType extends a common base type.

– Defines an introspection model for the service– You can query it (in a standard way) to discover

• What methods/messages a port understands• What other port types does the service provide?• If the service is “stateful” what is the current state?

• Factory Model• A set of standard portTypes for

– Message subscription and notification– Service collections

• Each service is identified by a URI called the “Grid Service Handle” • GSHs are bound dynamically to Grid Services References (typically wsdl

docs)– A GSR may be transient. GSHs are fixed.– Handle map services translate GSHs into GSRs.

Page 21: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

OGSI and Stateful Services• Sometimes you can send a message to a service, get a result and

that’s the end

– This is a statefree service

• However most non-trivial services need state to allow persistent asynchronous interactions

• OGSI is designed to support Stateful services through two mechanisms

– Information Port: where you can query for SDE (Service Definition Elements)

– “Factories” that allow one to view a Service as a “class” (in an object-oriented language sense) and create separate instances for each Service invocation

• There are several interesting issues here

– Difference between Stateful interactions and Stateful services

– System or Service managed instances

Page 22: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Factories and OGSI• Stateful interactions are typified by amazon.com where messages carry correlation

information allowing multiple messages to be linked together– Amazon preserves state in this fashion which is in fact preserved in its

database permanently• Stateful services have state that can be queried outside a particular interaction• Also note difference between implicit and explicit factories

– Some claim that implicit factories scale as each service manages its own instances and so do not need to worry about registering instances and lifetime management

• See WS-Addressing from largely IBM and Microsofthttp://msdn.microsoft.com/webservices/default.aspx?pull=/library/en-us/dnglobspec/html/ws-addressing.asp

FACTORY

1

2

3

4

FACTORY

1

2

3

4

Explicit FactoryImplicit Factory

Page 23: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Technical Activities of Note• Look at different styles of Grids such as Autonomic

(Robust Reliable Resilient)• New Grid architectures hard due to investment required• Critical Services Such as

– Security – build message based not connection based– Notification – event services– Metadata – Use Semantic Web, provenance– Databases and repositories – instruments, sensors– Computing – Submit job, scheduling, distributed file systems– Visualization, Computational Steering– Fabric and Service Management– Network performance

• Program the Grid – Workflow• Access the Grid – Portals, Grid Computing Environments

Page 24: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Issues and Types of Grid Services• 1) Types of Grid

– R3– Lightweight– P2P– Federation and Interoperability

• 2) Core Infrastructure and Hosting Environment

– Service Management– Component Model– Service wrapper/Invocation – Messaging

• 3) Security Services– Certificate Authority– Authentication– Authorization– Policy

• 4) Workflow Services and Programming Model

– Enactment Engines (Runtime)– Languages and Programming– Compiler– Composition/Development

• 5) Notification Services• 6) Metadata and Information Services

– Basic including Registry– Semantically rich Services and meta-data– Information Aggregation (events)– Provenance

• 7) Information Grid Services– OGSA-DAI/DAIT– Integration with compute resources– P2P and database models

• 8) Compute/File Grid Services– Job Submission– Job Planning Scheduling Management– Access to Remote Files, Storage and

Computers– Replica (cache) Management– Virtual Data– Parallel Computing

• 9) Other services including– Grid Shell– Accounting– Fabric Management– Visualization Data-mining and

Computational Steering– Collaboration

• 10) Portals and Problem Solving Environments

• 11) Network Services– Performance– Reservation– Operations

Page 25: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Data

Technology Components of (Services in)a Computing Grid

1: Job Management Service(Grid Service Interface to user or program client)

2: Schedule and control Execution

1: Plan Execution 4: Job Submittal

Remote Grid ServiceRemote Grid Service

6: File andStorage Access

3: Access to Remote Computers

Data

7: CacheData

Replicas5: Data Transfer

10: JobStatus

8: VirtualData

9: Grid MPI

Page 26: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Taxonomy of Grid Operational StyleName of Grid Style Description of Grid Operational or

Architectural Style

Semantic Grid Integration of Grid and Semantic Web meta-data and ontology technologies

Peer-to-peer Grid Grid built with peer-to-peer mechanisms

Lightweight Grid Grid designed for rapid deployment and minimum life-cycle support costs

Collaboration Grid Grid supporting collaborative tools like the Access Grid, whiteboard and shared applications.

RRR or Autonomic Grid

Fault tolerant and self-healing Grid

Robust Reliable Resilient RRR

Page 27: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Virtualization• The Grid could and sometimes does virtualize various

concepts – should do more• Location: URI (Universal Resource Identifier) virtualizes

URL (WSAddressing goes further)• Replica management (caching) virtualizes file location

generalized by GriPhyn virtual data concept• Protocol: message transport and WSDL bindings

virtualize transport protocol as a QoS request• P2P or Publish-subscribe messaging virtualizes matching

of source and destination services• Semantic Grid virtualizes Knowledge as a meta-data

query• Brokering virtualizes resource allocation• Virtualization implies all references can be indirect and

needs powerful mapping (look-up) services -- metadata

Page 28: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Metadata and Semantic Grid• Can store in one catalog, multiple catalogs or in each service

– Not clear how a coherent approach will develop• Specialized metadata services like UDDI and MDS (Globus)

– Nobody likes UDDI– MDS uses old fashioned LDAP– RGMA is MDS with a relational database backend

• Some basic XML database (Oracle, Xindice …)• “By hand” as in current SERVOGrid Portal which is roughly same

as using service stored SDE’s (Service Data Elements) as in OGSI• Semantic Web (Darpa) produced a lot of metadata tools aimed at

annotating and searching/reasoning about metadata enhanced webpages– Semantic Grid uses for enriching Web Services– Implies interesting programming model with traditional analysis

(compiler) augmented by meta-data annotation

Page 29: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Database

SDE1SDE2

Service

SDE1SDE2

Service

SDE1SDE2

Service

SDE1SDE2

Service

SDE1SDE2

Service

SDE1SDE2

Service

SDE1SDE2

Service

Individual Services

Information Ports

Grid or Domain Specific Metadata Catalogs

System or Federated Registry or Metadata Catalog

Database1 Database2 Database3

Three Metadata Architectures

Page 30: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Database

SERVOGrid ComplexitySimulation Service

XML Meta-dataService

Jobs Tools

SERVOPSE Programs

using CCEML(SERVOML)

MultiScaleOntologiesJob MetaData

Tool MetaData

Selected GeoInformatics Data

Complexity Scripts

Importance of Metadata Service; how should this be implemented?

Workflow

Page 31: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

SERVOGrid Requirements• Seamless Access to Data repositories and large scale

computers• Integration of multiple data sources including sensors,

databases, file systems with analysis system– Including filtered OGSA-DAI

• Rich meta-data generation and access with SERVOGrid specific Schema extending openGIS standards and using Semantic Grid

• Portals with component model for user interfaces and web control of all capabilities

• Collaboration to support world-wide work• Basic Grid tools: workflow and notification

Page 32: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Approach• Build on e-Science methodology and Grid

technology• Science applications with multi-scale

models, scalable parallelism, data assimilation as key issues– Data-driven models for earthquakes, climate,

environment …..

• Use existing code/database technology (SQL/Fortran/C++) linked to “Application Web/OGSA services” – XML specification of models, computational

steering, scale supported at “Web Service” level as don’t need “high performance” here

– Allows use of Semantic Grid technology

Typicalcodes

WS linkingto user andOther WS

(data sources)

Application WS

Page 33: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Integration of Data and Filters• One has the OGSA-DAI Data repository interface combined

with WSDL of the (Perl, Fortran, Python …) filter • User only sees WSDL not data syntax• Some non-trivial issues as to where the filtering compute

power is– Microsoft says filter next to data

DBFilter

WSDL

Of Filter

OGSA-DAI

Interface

Page 34: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

DatabaseService

SensorService

ComputeService

ParallelSimulation

Service

Middle Tier with XML Interfaces

VisualizationService

ApplicationService-1

Users

Database

ApplicationService-2

ApplicationService-3

CCE Control Portal Aggregation

SERVOGrid Complexity Computing Environment

XML Meta-dataService

ComplexitySimulation

Service

Page 35: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

HPCSimulation

DataFilter

Data FilterD

ata

Filt

er

Data

Filter

Data

Filter

Distributed Filters massage dataFor simulation

Other

Grid

and W

eb

Servi

ces

AnalysisControl

Visualize

SERVOGrid (Complexity)Computing Model

Grid

OGSA-DAIGrid Services

This Type of Gridintegrates with

Parallel computingMultiple HPC

facilities but only use one at a time

Many simultaneous data sources and

sinks

Grid Data Assimilation

Page 36: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Data Assimilation• Data assimilation implies one is solving some optimization

problem which might have Kalman Filter like structure

• As discussed by DAO at Earth Science meeting, one will become more and more dominated by the data (Nobs much larger than number of simulation points).

• Natural approach is to form for each local (position, time) patch the “important” data combinations so that optimization doesn’t waste time on large error or insensitive data.

• Data reduction done in natural distributed fashion NOT on HPC machine as distributed computing most cost effective if calculations essentially independent – Filter functions must be transmitted from HPC machine

2 2

1

min ( , ) _obsN

i iTheoretical Unknownsi

Data position time Simulated Value Error

Page 37: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Distributed Filtering

HPC Machine

Distributed Machine

Data FilterNobslocal patch 1

Nfilteredlocal patch 1

Data FilterNobslocal patch 2

Nfilteredlocal patch 2

GeographicallyDistributedSensor patches

Nobslocal patch >> Nfiltered

local patch ≈ Number_of_Unknownslocal patch

Send needed FilterReceive filtered data

In simplest approach, filtered data gotten by linear transformations on original data based on Singular Value Decomposition of Least squares matrix

Factorize Matrixto product oflocal patches

Page 38: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Two-level Programming I• The paradigm implicitly assumes a two-level Programming

Model• We make a Service (same as a “distributed object” or

“computer program” running on a remote computer) using conventional technologies– C++ Java or Fortran Monte Carlo module

– Data streaming from a sensor or Satellite

– Specialized (JDBC) database access

• Such services accept and produce data from users files and databases

• The Grid is built by coordinating such services assuming we have solved problem of programming the service

Service Data

Page 39: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Two-level Programming II• The Grid is discussing the composition of distributed

services with the runtime interfaces to Grid as opposed to UNIX pipes/data streams

• Familiar from use of UNIX Shell, PERL or Python scripts to produce real applications from core programs

• Such interpretative environments are the single processor analog of Grid Programming

• Some projects like GrADS from Rice University are looking at integration between service and composition levels but dominant effort looks at each level separately

Service1 Service2

Service3 Service4

Page 40: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Why we can dream of using HTTP and that slow stuff

• We have at least three tiers in computing environment• Client (user portal)• “Middle Tier” (Web Servers/brokers)• Back end (databases, files, computers etc.)• In Grid programming, we use HTTP (and used to use

CORBA and Java RMI) in middle tier ONLY to manipulate a proxy for real job– Proxy holds metadata

– Control communication in middle tier only uses metadata

– “Real” (data transfer) high performance communication in back end

Page 41: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Raw (HPC) Resources

Middleware

Database

PortalServices

SystemServices

SystemServices

SystemServices

Application Service

SystemServices

SystemServices

GridComputing

Environments

UserServices

“Core”Grid

Application Metadata

Actual Application

Page 42: Geoffrey Fox Community Grids Lab Indiana University gcf@indiana

Workflow and SERVOGrid CCE• SERVOGrid will use workflow technology to support both

– “code and data coupling”– Multiscale features

• Implementing multiscale model requires – building Web services for each model, – describing each model with metadata and – Describing linkage of models (linkage of ports on web services)– And describing when to use which scale model

• So workflow and multiscale depend on web services described by rich metadata

• This analysis isn’t correct if scales must be “tightly coupled” as current workflow won’t support this (area addressed by CCA from DoE)– We should focus on multiscale models with loose “service”

coupling– Hopefully we will learn how to take same architecture, compile

away inefficiencies and get high performance on tighter coupling than conventional distributed workflow