enabling escience: open software, standards, infrastructure
Post on 05-Jan-2016
30 Views
Preview:
DESCRIPTION
TRANSCRIPT
Enabling eScience:Open Software, Standards,
InfrastructureIan Foster
Argonne National Laboratory
University of Chicago
Globus Alliance
www.mcs.anl.gov/~foster
UK eScience Meeting, Nottingham, September 2, 2004
The Grid Meets the BBC
“The Grid is an international project that looks in detail at a terrorist cell operating on a global level and a team of American and British counter-terrorists who are tasked to stop it”
Gareth Neame, BBC's head of drama
A Better Characterization?
“The Grid is an international project that looks in detail at scientific collaborations operating on a global level and a team of computer scientists who are tasked to enable them”
But perhaps not as telegenic?
4
eScience & Grid: 6 Theses1. Scientific progress depends increasingly on large-scale
distributed collaborative work
2. Such distributed collaborative work raises challenging problems of broad importance
3. Any effective attack on those problems must involve close engagement with applications
4. Open software & standards are key to producing & disseminating required solutions
5. Shared software & service infrastructure are essential application enablers
6. A cross-disciplinary community of technology producers & consumers is needed
5
Software,Standards
Implication: A Problem-Driven, Collaborative R&D Methodology
Design
DeployBuild
Apply
Analyze
ApplyApply
Deploy
Apply
ComputerScience
Infra-structure
DisciplineAdvances
6
Overview
How are we doing? Software Standards Infrastructure Community
An advertorial, and request for input Globus Toolkit version 4
Summary
7
Overview
How are we doing? Software Standards Infrastructure Community
An advertorial, and request for input Globus Toolkit version 4
Summary
8
Why Open Software Matters
eScience requires sophisticated functionality but is a small “market”
Commercial software does not meet needs Open software can help jumpstart development
by reducing barriers to entry Encourage adoption of common approaches to
key technical problems Enable broad Grid technology ecosystem
A basis for international cooperation A basis for cooperation with industry
9
“Open Software” isUltimately about Community
Contributors: design, development, packaging, testing, documentation, training, support United by common architectural perspective
Users May be major contributors via, e.g., testing
Governance structure To determine how the software evolves
Processes for coordinating all these activities Packaging, testing, reporting, …
An ecosystem of complementary components Enabled by appropriately open architecture
10
“Ecosystem”?
Not a monoculture … … or Cambrian explosion
… but a web of components
11
E.g., Globus Alliance & Toolkit(Argonne, USC/ISI, Edinburgh, PDC, NCSA)
An international partnership dedicated to creating & disseminating high-quality open source Grid technology: the Globus Toolkit Design, engineering, support, governance
Academic Affiliates make major contributions EU: CERN, MPI, Poznan, INFN, etc. AP: AIST, TIT, Monash, etc. US: SDSC, TACC, UCSB, UW, etc.
Significant industrial contributions & adoption 1000s of users worldwide, many contribute
12
Broader Ecosystem*:Example Complementary Projects
NSF Middleware Initiative Packaging, testing, additional components
Virtual Data Toolkit (GriPhyN + PPDG) GT, Condor, Virtual Data System, etc.
EGEE and “gLite” Close collaboration with Globus + Condor
TeraGrid, Earth System Grid, NEESgrid, … Consume and produce components
Open Middleware Infrastructure Institute Collaboration on components, testing, etc.
* See tutorial by Lee Liming: AHM, GGF, SC’2004.
13
Broader Ecosystem:E.g., NMI Distributed Test Facility
(NSF Middleware Initiative’s GRIDS Center)
How Grid Software Works: NSF Network for Earthquake Engineering Simulation (NEES) Transform our ability to carry out research vital to
reducing vulnerability to catastrophic earthquakes
15
Building a NEES Collaboratory:What the User Wants
Secure, reliable, on-demand access to data,software, people, and other resources(ideally all via a Web Browser)
16
Resources implement standard access & management interfaces
Collective services aggregate &/or
virtualize resources
Users work with client applications
Application services organize VOs & enable
access to other services
How it Really Happens(A Simplified View)
WebBrowser
ComputeServer
DataCatalog
DataViewer
Tool
Certificateauthority
ChatTool
CredentialRepository
WebPortal
ComputeServer
Databaseservice
Databaseservice
Databaseservice
SimulationTool
Camera
Camera
TelepresenceMonitor
RegistrationService
17
How it Really Happens(without Grid Software)
WebBrowser
ComputeServer
DataCatalog
DataViewer
Tool
Certificateauthority
ChatTool
CredentialRepository
WebPortal
ComputeServer
Resources implement standard access & management interfaces
Collective services aggregate &/or
virtualize resources
Users work with client applications
Application services organize VOs & enable
access to other services
Databaseservice
Databaseservice
Databaseservice
SimulationTool
Camera
Camera
TelepresenceMonitor
RegistrationService
A
B
C
D
E
Application Developer
10
Off the Shelf
12
GlobusToolkit
0
Grid Community
0
18
Resources implement standard access & management interfaces
Collective services aggregate &/or
virtualize resources
Users work with client applications
Application services organize VOs & enable
access to other services
How it Really Happens(with Grid Software)
WebBrowser
ComputeServer
GlobusMCS/RLS
DataViewer
Tool
CertificateAuthority
CHEF ChatTeamlet
MyProxy
CHEF
ComputeServer
Databaseservice
Databaseservice
Databaseservice
SimulationTool
Camera
Camera
TelepresenceMonitor
Globus IndexService
GlobusGRAM
GlobusGRAM
GlobusDAI
GlobusDAI
GlobusDAI
Application Developer
2
Off the Shelf
9
Globus
Toolkit5
Grid Community
3
19
0
10
20
30
40
50
60
70
8:0
0
8:3
0
9:0
0
9:3
0
10
:00
10
:30
11
:00
11
:30
12
:00
12
:30
13
:00
13
:30
14
:00
14
:30
15
:00
15
:30
16
:00
16
:30
17
:00
17
:30
18
:00
18
:30
Nu
mb
er
of
Pa
rtic
ipa
nts
UIUC
Colorado
NEESgridMultisite OnlineSimulation Test
(July 2003)
Illin
ois
Colo
rado
Illinois (simulation)
20
NEESgrid Summary A successful “turn of the crank”
S/w produced & deployed on time & budget, and new applications enabled
A producer as well as consumer of Grid s/w Many sociopolitical “learning opportunities”
4 tasks: develop s/w, engineer s/w, elicit requirements, educate community
Experiment-driven deployment™ was key “No victory is final”: challenges remain
Hand off s/w to separate operations team Sharing of facilities, data: politically charged
21
Software: Summary Good software arises from trying to solve real
problems in real projects—& then generalizing E.g., Globus: security, job submission/mgmt, data
movement, monitoring, etc. The result is solutions that make sense within a wide
variety of applications Solve real problems, but not every problem
Resulting software is not a “turnkey” solution for any significant application “Turnkey” solutions require integration Factoring can extract higher-level “solutions”
22
Example “Solutions” Portal-based User Registration System (PURSE)
Source: Earth System Grid, PDC Web-based A&A management
Lightweight Director Replicator Source: LIGO Data replication management
Workflow execution & management DAGman + Condor-G + Globus components Source: Virtual data toolkit
Service monitoring & fault detection Source: Earth System Grid
23
Overview
How are we doing? Software Standards Infrastructure Community
An advertorial, and request for input Globus Toolkit version 4
Summary
24
“Standards”: Examples of Success
Grid Security Infrastructure Broadly used, multiple implns, WS-Security Rich Grid security ecosystem, with linkages to
MyProxy, OTP, KX509, Shibboleth, … GridFTP
Broadly used, multiple implementations WSDL/SOAP
Facilitating service-oriented architectures OGSI/WSRF
Many find encode useful patterns, behaviors
25
Standards: Status Open Grid Services Architecture (OGSA)—the lighthouse by
which we steer Defines requirements & priorities But far from complete
W3C, OASIS, GGF, DMTF, IETF Good things are happening in many areas WS-Agreement, DAIS, SRM, …, …
But for those building systems today? Problem areas: monitoring, policy, data, etc. Ad hoc approaches: will cost us big later
“Experiment-driven deployment” on intl. scale to drive interoperability of infrastructure, code
26
Overview
How are we doing? Software Standards Infrastructure Community
An advertorial, and request for input Globus Toolkit version 4
Summary
27
Infrastructure Broadly deployed services in support of virtual
organization formation and operation Authentication, authorization, discovery, …
Services, software, and policies enabling on-demand access to important resources Computers, databases, networks, storage, software
services,… Operational support for 24x7 availability Integration with campus infrastructures Distributed, heterogeneous, instrumented systems
can be wonderful CS testbeds
28Grid2003: An Operational Grid 28 sites (2100-2800 CPUs) & growing 400-1300 concurrent jobs 8 substantial applications + CS experiments Running since October 2003
Korea
http://www.ivdgl.org/grid2003
29
Grid2003 Software Stack(“Virtual Data Toolkit”)
Application
Chimera Virtual Data System
DAGMan and Condor-G
Globus Toolkit GSI, GRAM, GridFTP, etc.
Site schedulers and file systems
Clusters and storage systems
Three levels of deployment:+ Site services: GRAM, GridFTP, etc.+ Global & virtual organization services+ IGOC: iVDGL Grid Operations Center
30
Grid2003 Metrics
Metric Target AchievedNumber of CPUs 400 2762 (28 sites)
Number of users > 10 102 (16)
Number of applications > 4 10 (+CS)
Number of sites running concurrent apps
> 10 17
Peak number of concurrent jobs 1000 1100
Data transfer per day > 2-3 TB 4.4 TB max
31
Grid2003 Applications To Date
CMS proton-proton collision simulation ATLAS proton-proton collision simulation LIGO gravitational wave search SDSS galaxy cluster detection ATLAS interactive analysis BTeV proton-antiproton collision simulation SnB biomolecular analysis GADU/Gnare genone analysis Various computer science experiments
www.ivdgl.org/grid2003/applications
32Example Grid3 Application:NVO Mosaic Construction
NVO/NASA Montage: A small (1200 node) workflow
Construct custom mosaics on demand from multiple data sources
User specifies projection, coordinates, size, rotation, spatial sampling
Work by Ewa Deelman et al., USC/ISI and Caltech
33
Next Step:Open Science Grid
U.S. (international?) consortium to provide services to a broad set of sciences
Grid3 as a starting point, expanding to include many more sites
A major focus is the MOU/SLA structure required to sustain & scale operations Resource providers Resource consumers Virtual organizations
We hope to collaborate with TeraGrid, EGEE, UK NGS, etc.
34
Infrastructure: Summary
Encouraging progress Real understanding of how to operate Grid
infrastructures is emerging Production infrastructures are appearing
and are being relied upon for real science Significant areas of concern remain
Security is going to get harder International interoperability still elusive We haven’t got the right model for sustained
infrastructure development & support
35
Overview
How are we doing? Software Standards Infrastructure Community
An advertorial, and request for input Globus Toolkit version 4
Summary
36
Community Big picture is extremely positive
The “eScience”/“Grid” community is large, enthusiastic, smart, and diverse
Significant exchange of ideas, software, personnel, experiences
Real application-CS cooperation We can do better in various specific areas
Not clear we’re always focusing on the real problems: often viewed as “mundane”??
CS community could be even more engaged Software development a community effort
GlobalCommunity
38
Overview
How are we doing? Software Standards Infrastructure Community
An advertorial, and request for input Globus Toolkit version 4
Summary
39
What’s New inGT 4.0 (January 31, 2005)
For all: Additions: data, security, execution, XIO, … Improved packaging, testing, performance, usability,
doc, standards compliance (phew) WS components ready for broader use
For the end user: More complementary tools & solutions C, Java, Python APIs; command line tools
For the developer: Java (Axis/Tomcat) hosting greatly improved Python (pyGlobus) hosting for the first time
40
41Apache Axis Web Services Container
Good news for Java WS developers: GT4.0 works with standard Axis* and Tomcat* GT provides Axis-loadable libraries, handlers Includes useful behaviors such as inspection,
notification, lifetime mgmt (WSRF) Others implement GRAM, etc.
Major Globus contributions to Apache ~50% of WS-Addressing code ~15% of WS-Security code Many bug fixes WSRF code a possible next contribution
* Modulo Axis and Tomcat release cycle issues
Axis
SecurityAddressing
GTbits
Appbits
42
Standards Compliance Web services: WS-I compliance
All interfaces support WS-I Basic Profile, modulo use of WS-Addressing
Security
a) WS-I Basic Security Profile (plaintext)
b) IETF RFC 3820 Proxy Certificate GridFTP
GGF GFD 020 Others in progress & being tracked
WSRF (OASIS), WS-Addressing (W3C), OGSA-DAI (GGF), RLS (GGF)
43
Globus Ecosystem(Just a Few Examples Listed Here)
Tools provide higher-level functionality Nimrod-G, MPICH-G2, Condor-G, Ninf-G NTCP telecontrol GT4IDE Eclipse IDE
Packages integrate GT with other s/w VDT, NMI, CTSS, NEESgrid, ESG
Solutions package a set of functionality VO management, monitoring, replica mgmt
Documentation, e.g. Borja Sotomayor’s tutorial
44
GT4.0 Release Schedule
Date StabilityLevel
Features added after?
Public interfaces
changed after?
Aug 3 Alpha Yes Yes
Oct 15
Full-featured development
No Yes, but only if significant benefits
Dec 3 Beta-quality development
No No
Jan 31,
2005
Stable release
(FINAL)
No No
45
We’d Getting a Lot of Help,But Could do with A Lot More
Testing and feedback Users, developers, deployers: plan to use the software
now & provide feedback Tell us what is missing, what performance you need,
what interfaces & platforms, … Ideally, also offer to help meet needs (-:
Related software, solutions, documentation Adapt your tools to use GT4 Develop new GT4-based components Develop GT4-based solutions Develop documentation components
46
Overview
How are we doing? Software Standards Infrastructure Community
An advertorial, and request for input Globus Toolkit version 4
Summary
47
eScience & Grid: 6 Theses1. Scientific progress depends increasingly on large-scale
distributed collaborative work
2. Such distributed collaborative work raises challenging problems of broad importance
3. Any effective attack on those problems must involve close engagement with applications
4. Open software & standards are key to achieving a critical mass of contributors
5. Shared software & service infrastructure are essential application enablers
6. A cross-disciplinary community of technology producers & consumers is vital
48
Overall, We are Doing Well
Communities & individuals are, increasingly, using the Grid to advance their science
Broad consensus on many key architecture concepts, if not always their implementation
Significant base of open source software, widely used in applications & infrastructure
Service-oriented arch facilitates cooperation on software development & code reuse
Grid standards are making a difference on a daily basis: e.g., GSI, GridFTP
49
Overall, We are Doing Well (2)
A real understanding of how to operate Grid infrastructures is emerging
Production infrastructures are appearing and are being relied upon for real science
Productive international cooperation is occurring at many levels
A vibrant community has formed and shows no signs of slowing down
Real connections have been formed between computer science & applications
50
Software,Standards
Problem-Driven, Collaborative R&D Methodology
Design
DeployBuild
Apply
Analyze
ApplyApply
Deploy
Apply
ComputerScience
Infra-structure
DisciplineAdvances
GlobalCommunity
51
Software Ecosystem
Not a monoculture … … or Cambrian explosion
… but a web of components
52
We Can Certainly Do Better Be smarter about how we work with users
Not enough to point people at a manual Treat s/w as shared infrastructure, to be developed,
engineered, tested, improved Be honest about costs & time scales, expertise
Establish real collaboration on software Partition the space of what to do: it’s large Partners, not customers or competitors
Tackle process issues explicitly Standardize on packaging, testing, support Deployment, operations, security issues
53
We Can Certainly Do Better (2) Aspire to code reuse & interoperability
Interoperability layers are not the answer Recognize the costs of noninteroperability
Focus standards efforts on the real problems faced when sharing software & infrastructure Quit fiddling with Web services infrastructure!
Build sustained, critical mass teams Problems are hard; requires time & expertise
Build and operate large-scale Grids with real application groups to drive all of this With explicit O(5) year focus and goals
54
Thanks, in particular, to:
Carl Kesselman and Steve Tuecke, my long-time Globus co-conspirators
Kate Keahey, Lee Liming, Jennifer Schopf, Gregor von Laszewski, Mike Wilde @ Argonne
Globus Alliance members at Argonne, U.Chicago, USC/ISI, Edinburgh, PDC, NCSA
Miron Livny, U.Wisconsin Condor project Other partners in Grid technology, application, &
infrastructure projects DOE, NSF (esp. NMI program), NASA, IBM, Microsoft for
generous support
55
For More Information
Globus Alliance www.globus.org
Global Grid Forum www.ggf.org
Open Science Grid www.opensciencegrid.org
Background information www.mcs.anl.gov/~foster
GlobusWORLD 2005 Feb 7-11, Boston
2nd Editionwww.mkp.com/grid2
Extra Slides
57
Globus Toolkit: A Brief History GT 1.0 (1998) to 2.0 (2002)
PKI-based Grid Security Infrastructure Execution (GRAM), data (GridFTP), info (MDS) Gradual introduction of support processes
GT 3.0 (June 2003) and 3.2 (Feb 2004) International collaboration Higher-level services: replica location, file transfer,
registry, credential repository Refactoring GT mechanisms WS framework Most production deployments recommended to use
pre-WS (“GT2.4”) components
58
NEESgrid Software Details Inputs
Essentially all of GT3.2—GSI, GridFTP, GRAM, MDS, … (coherent architecture helps!)
CHEF, Creare Data Turbine, OpenSEES Custom development
NTCP telecontrol, data management, etc. Integration
All of the above, and more Outputs
NEESgrid system NEES consortium NTCP components Globus Toolkit …
top related