observations on architecture, protocols, services, apis, sdks, and the role of the grid forum
DESCRIPTION
Observations on Architecture, Protocols, Services, APIs, SDKs, and the Role of the Grid Forum. Ian Foster With: Carl Kesselman, Steven Tuecke Thanks also to: Bill Johnston, Marty Humphrey, Rusty Lusk, Reagan Moore, and others. Overview. - PowerPoint PPT PresentationTRANSCRIPT
1
Observations on Architecture,Protocols, Services, APIs, SDKs, and the Role of the Grid Forum
Ian FosterWith: Carl Kesselman, Steven TueckeThanks also to: Bill Johnston, Marty
Humphrey, Rusty Lusk, Reagan Moore, and others
2
Overview
1. The Grid problem: controlled resource sharing in multi-institutional settings
2. Standards as a means of enabling sharing of code, resources, services
3. Aside: definition, role, and importance of protocols, services, SDKs, APIs, etc.
4. A “Grid Architecture”: a categorization of protocols, services, SDKs, and APIs
5. Questions for the Grid Forum
3
The Grid Problem
Grid R&D has its origins in high-end computing & metacomputing, but…
In practice, the “Grid problem” is about resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations– Lack of central control, omniscience, trust
Primary challenge: to enable, maintain, and control the sharing of resources to achieve a common goal
4
Examples of Virtual Organizations
Members of a scientific collaboration– E.g., NSF PACIs, IPG, NEESgrid, GriPhyN– Sharing: computers, storage, software, …
Application server provider + customers– Sharing: ASP computers
Participants in peer-to-peer network– E.g., Gnutella, Napster, Entropia, …– Sharing: resources on individual PCs
Tremendous variety in scope, timescale, types of sharing, etc.
5
Universal Nature of the Grid Problem
“Sharing” fundamental in many settings– Application Service Providers, Storage Service
Providers, etc.; Peer-to-peer computing; Distributed computing; Business to business; …
Sharing issues not adequately addressed by existing technologies– Sharing at a deep level, across broad ranges
of resources and in a general way– E.g., user provides ASP with controlled access
to their data on an SSP: how?? Grid community has unique experience
6
Creating Usable Grids:What are the Challenges?
Approaches to problem solving– Data Grids, distributed computing, peer-to-
peer, collaboration grids, … Structuring and writing programs
– Abstractions, tools Enabling resource sharing across
distinct institutions– Resource discovery, access, reservation,
allocation; authentication, authorization, policy; communication; fault detection and notification; …
7
What is the Role of Grid Forum in Enabling Grid Computing?
1. Information exchange, of course Experiences, patterns, structures Useful even if every application & Grid is a
vertical “stovepipe”
2. Advocacy3. Enabler of shared effort
In code development: libraries, tools, … Via resource sharing: shared Grids In infrastructure
Opinion: Long term, only the third is sufficiently compelling to justify GF
8
Q: How do we Enable Shared Effort?A: “Standards” are Required
To enable portability/sharing of code– E.g., MPI lets me write portable // programs
To enable resource sharing– E.g., IP lets my computer speak to yours
To enable shared infrastructure– E.g., X.509 lets me share Certificate Authorities
But what sorts of “standards”?– Variously, APIs/SDKs, protocols, syntax, …– Observe that these are sometimes confused, so
let’s spend some time on definitions …
9
Some Important Definitions
Resource Network protocol Network enabled service Application Programmer Interface (API) Software Development Kit (SDK) Syntax
Not discussed, but important: policies
10
Resource
An entity that is to be shared– E.g., computers, storage, data, software
Does not have to be a physical entity– E.g., Condor pool, distributed file system, …
Defined in terms of interfaces, not devices– E.g. scheduler such as LSF and PBS define a
compute resource– Open/close/read/write define access to a
distributed file system, e.g. NFS, AFS, DFS
11
Network Protocol
A formal description of message formats and a set of rules for message exchange– Rules may define sequence of message
exchanges– Protocol may define state-change in
endpoint, e.g., file system state change Good protocols designed to do one thing
– Protocols can be layered Examples of protocols
– IP, TCP, TLS (was SSL), HTTP, Kerberos
12
Network Enabled Services
Implementation of a protocol that defines a set of capabilities– Protocol defines interaction with service– All services require protocols– Not all protocols are used to provide
services (e.g. IP, TLS) Examples: FTP and Web servers
Web Server
IP Protocol
TCP Protocol
TLS Protocol
HTTP Protocol
FTP Server
IP Protocol
TCP Protocol
FTP Protocol
Telnet Protocol
13
Application Programmer Interface
A specification for a set of routines to facilitate application development– Refers to definition, not implementation– E.g., there are many implementations of MPI
Spec often language-specific (or IDL)– Routine name, number, order and type of
arguments; mapping to language constructs– Behavior or function of routine
Examples– GSS API (security), MPI (message passing)
14
Software Development Kit
A particular instantiation of an API SDK consists of libraries and tools
– Provides implementation of API specification
Can have multiple SDKs for an API Examples of SDKs
– MPICH, Motif Widgets
15
Syntax
Rules for encoding information, e.g.– XML, Condor ClassAds, Globus RSL– X.509 certificate format (RFC 2459)– Cryptographic Message Syntax (RFC 2630)
Distinct from protocols– One syntax may be used by many protocols
(e.g., XML); & useful for other purposes Syntaxes may be layered
– E.g., Condor ClassAds -> XML -> ASCII– Important to understand layerings when
comparing or evaluating syntaxes
16
A Protocol can have Multiple APIsE.g., TCP/IP
TCP/IP APIs include BSD sockets, Winsock, System V streams, …
The protocol provides interoperability: programs using different APIs can exchange information
I don’t need to know remote user’s API
TCP/IP Protocol: Reliable byte streams
WinSock API Berkeley Sockets API
Application Application
17
An API can have Multiple ProtocolsE.g., Message Passing Interface
MPI provides portability: any correct program compiles & runs on a platform
Does not provide interoperability: all processes must link against same SDK– E.g., MPICH and LAM versions of MPI
ApplicationApplication
MPI API MPI API
LAM SDK
LAM protocol
MPICH-P4 SDK
MPICH-P4 protocol
TCP/IP TCP/IPDifferent message formats, exchange
sequences, etc.
18
Back to Grids:The Programming & Systems Problems
Approaches to problem solving– Data Grids, distributed computing, peer-to-
peer, collaboration grids, … Structuring and writing programs
– Abstractions, tools Enabling resource sharing across
distinct institutions– Resource discovery, access, reservation,
allocation; authentication, authorization, policy; communication; fault detection and notification; …
Programming Problem
Systems Problem
20
Aspects of the Programming Problem
Need for abstractions and models to add to speed/robustness/etc. of development– E.g., OO abstractions, MPI for messaging
Need for code/tool sharing to allow reuse of code components developed by others– E.g., MPI allows reuse of message passing– E.g., standard profilers, debuggers
Primary need is for standard programming environments: APIs and SDKs
21
Aspects of the Systems Problem
Need for interoperability when different groups want to share resources– Diverse components, policies, mechanisms– E.g., standard notions of identity, means of
communication, resource descriptions Need for shared infrastructure services to
avoid repeated development, installation– E.g., one port/service for remote access to
computing, not one per tool/application– E.g., Certificate Authorities: expensive to run
Need standard protocols, services, syntax
22
I.e., Standard APIs and Protocols are Both Important: For Different Reasons
Standard APIs/SDKs are important– They enable application portability– But w/o standard protocols, interoperability
is hard (every SDK speaks every protocol?) Standard protocols are important
– Enable cross-site interoperability– Enable shared infrastructure– But w/o standard APIs/SDKs, application
portability is hard (different platforms access protocols in different ways)
23
Grid “Architecture”
We now proceed to analyze Grid systems with respect to standards
Identify key areas where protocols, services, APIs, and SDKs can occur
Result is a layered protocol architecture
We assert this can be useful as a means of describing and structuring Grid Forum activities
24
Layered Grid Architecture(By Analogy to Internet Architecture)
Application
Fabric“Controlling things locally”: Access to, & control of, resources
Connectivity“Talking to things”: communication (Internet protocols) & security
Resource“Sharing single resources”: negotiating access, controlling use
Collective“Managing multiple resources”: ubiquitous infrastructure services
User“Specialized services”: user- or appln-specific distributed services
InternetTransport
Application
Link
Inte
rnet P
roto
col
Arch
itectu
re
25
Protocols, Services, and InterfacesOccur at Each Level
Languages/Frameworks
Fabric Layer
Applications
Local Access APIs and Protocols
Collective Service APIs and SDKs
Collective ServicesCollective Service Protocols
Resource APIs and SDKs
Resource ServicesResource Service Protocols
User Service ProtocolsUser Service APIs and SDKs
User Services
Connectivity APIs
Connectivity Protocols
26
An Aside on Terminology
Is this an “architecture” or just a “categorization” or “taxonomy”?– A matter of opinion (c.f. IAB: “Many
members of the Internet community would argue that there is no architecture”)
– Our opinion: it is somewhere in between, but is useful regardless
Becomes more architectural if/as we define “necessary” pieces at each level
Note that protocols says nothing about SDKs/APIs architecture (& vice versa)
27
Important Points
We build on Internet protocols– Communication, routing, name resolution, etc.
“Layering” here is conceptual, does not imply constraints on who can call what– Protocols/services/APIs/SDKs will, ideally, be
largely self-contained– But some things are fundamental: e.g.,
communication and security– But, advantageous for higher-level functions to
use common lower-level functions
28
ComputeResource
SDK
API
AccessProtocol
SourceCode Repository
SDK
API
LookupProtocol
Example: User Portal
Web Portal
Source code discovery, application configuration
Brokering, co-allocation, certificate authorities
Access to data, access to computers, access to network performance data
Communication, service discovery (DNS), authentication, authorization, delegation
Storage systems, schedulers
User
Appln
Collective
Resource
Connect
Fabric
29
ComputeResource
SDK
API
AccessProtocol
CheckpointRepository
SDK
API
C-pointProtocol
Example:High-Throughput Computing System
High Throughput Computing System
Dynamic checkpoint, job management, failover, staging
Brokering, certificate authorities
Access to data, access to computers, access to network performance data
Communication, service discovery (DNS), authentication, authorization, delegation
Storage systems, schedulers
User
Appln
Collective
Resource
Connect
Fabric
30
Standards, Again:Intergrid Protocols and Grid APIs
One or many protocols?– No one “right” protocol for any one function– But: interoperability requires that we define
and commit to core “Intergrid” protocols– Definition: “A resource is Grid-enabled if it
speaks Intergrid protocols” One or many APIs and SDKs?
– Many APIs, SDKs, programming models can target Intergrid protocols
– But: code sharing requires standards– So, e.g., “standard Grid collaboration APIs”
31
Questions for the Grid Forum
Is the “Grid architecture” described here a useful framework?– Could it be made more useful?– Are there things that it fails to capture or
misrepresents? Would it be a useful discipline for us to
try to place GF efforts in this context– E.g., be clear whether we are defining a
protocol, service, API, SDK, syntax (or something else: which is fine, too)
– E.g., explain (and argue about) where in the stack different pieces fit
32
Questions for the Grid Forum
Are some things easier, or more important, to standardize than others?– Protocols vs. APIs vs. syntax– Connectivity vs. resource vs. collective vs. user
layer protocols/services/APIs/SDKs I would suggest that
– Items lower in the stack tend to have broader impact, but standards useful at all levels
– Size of community effected (e.g., number of adopters) is the key figure of merit
– We should ask explicitly for such an analysis as part of a WG charter
33
Questions for the Grid Forum
Can we define core “intergrid protocols”?– I.e., instantiate (lower) layers in the diagram– We have avoided it until now (implies choice)– Until we do, interoperability is difficult
Possible approaches– Avoid seeking consensus, instead standardize
where it makes sense and where we can; rely on sense of “best practice” emerging
– Or, create an architecture WG, charged with defining requirements for “core protocols”??
– I think latter is better, unsure if it can work
34
Summary
Grids are about [large-scale] sharing– Hence require standard protocols to enable
interoperability and shared infrastructure– And, of course standard APIs and SDKs to
enable portability & code sharing– Both important; but very different
Well defined architecture can help understanding & progress– Provides a framework for figuring out where
the pieces fit– Facilitates asking questions such as “where
are standards particularly important?”
35
Questions?