the grid: beyond the hype ian foster argonne national laboratory university of chicago globus...

Download The Grid: Beyond the Hype Ian Foster Argonne National Laboratory University of Chicago Globus Alliance foster Seminar, Duke, September

If you can't read please download the document

Upload: moris-mosley

Post on 13-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

  • Slide 1

The Grid: Beyond the Hype Ian Foster Argonne National Laboratory University of Chicago Globus Alliance www.mcs.anl.gov/~foster Seminar, Duke, September 14, 2004 Slide 2 2 Abstract Grid technologies and infrastructure support the integration of services and resources within and among enterprises, and thus allow new approaches to problem solving and interaction within distributed, multi-organizational collaborations. Sustained effort by computer scientists and application developers has resulted in the creation of a substantial open source technology, numerous infrastructure deployments, a vibrant international community, and significant application success stories. Application communities are now working to deploy and apply these technologies more broadly, and thus we encounter ever more challenging requirements for scale, functionality, and robustness. In this talk, I seek to define the nature of the opportunities, achievements, and challenges that underlie this work. I describe the current state and likely evolution of the core technologies, focusing in particular on the Open Grid Services Architecture (OGSA), which integrates Grid technologies with emerging Web services standards. I discuss the implications of these developments for science, engineering, and industry, and present some of the lessons learned within large projects that apply the technologies. I also examine the opportunities and challenges that Grid deployments and applications present for computer scientists. Slide 3 3 Grid Hype Slide 4 4 Energy Internet The Shape of Grids to Come? Internet Hype? Slide 5 5 eScience & Grid: 6 Theses 1.Scientific progress depends increasingly on large-scale distributed collaborative work 2.Such distributed collaborative work raises challenging problems of broad importance 3.Any effective attack on those problems must involve close engagement with applications 4.Open software & standards are key to producing & disseminating required solutions 5.Shared software & service infrastructure are essential application enablers 6.A cross-disciplinary community of technology producers & consumers is needed Slide 6 Global Knowledge Communities: E.g., High Energy Physics Slide 7 7 The Grid Resource sharing & coordinated problem solving in dynamic, multi- institutional virtual organizations 1. Enable integration of distributed resources 2. Using general-purpose protocols & infrastructure 3. To achieve better-than-best-effort service Slide 8 8 The Grid (2) l Dynamically link resources/services u From collaborators, customers, eUtilities, (members of evolving virtual organization) l Into a virtual computing system u Dynamic, multi-faceted system spanning institutions and industries u Configured to meet instantaneous needs, for: l Multi-faceted QoX for demanding workloads u Security, performance, reliability, Slide 9 9 Software, Standards Problem-Driven, Collaborative Research Methodology Design DeployBuild Apply Analyze Apply Deploy Apply Computer Science Infra- structure Discipline Advances Global Community Slide 10 10 Problem-Driven, Collaborative Research Methodology Design DeployBuild Apply Analyze Apply Deploy Apply Computer Science Software, Standards Discipline Advances Infra- structure Global Community Slide 11 11 Resource/Service Integration as a Fundamental Challenge R Discovery Many sources of data, services, computation R Registries organize services of interest to a community Access Data integration activities may require access to, & exploration/analysis of, data at many locations Exploration & analysis may involve complex, multi-step workflows RM Resource management is needed to ensure progress & arbitrate competing demands Security service Security service Policy service Policy service Security & policy must underlie access & management decisions Slide 12 12 Earth Simulator Atmospheric Chemistry Group LHC Exp. Astronomy Grav. Wave Nuclear Exp. Current accelerator Exp. Scale Metrics: Participants, Data, Tasks, Performance, Interactions, Slide 13 13 Profound Technical Challenges How do we, in dynamic, scalable, multi-institutional, computationally & data-rich settings: l Negotiate & manage trust l Access & integrate data l Construct & reuse workflows l Plan complex computations l Detect & recover from failures l Capture & share knowledge l Represent & enforce policies l Achieve end-to-end QoX l Move data rapidly & reliably l Support collaborative work l Define primitive protocols l Build reusable software l Package & deliver software l Deploy & operate services l Operate infrastructure l Upgrade infrastructure l Perform troubleshooting l Etc., etc., etc. Slide 14 14 Grid Technologies Address Key Requirements l Infrastructure (middleware) for establishing, managing, and evolving multi-organizational federations u Dynamic, autonomous, domain independent u On-demand, ubiquitous access to computing, data, and services l Mechanisms for creating and managing workflow within such federations u New capabilities constructed dynamically and transparently from distributed services u Service-oriented, virtualization Slide 15 15 Computer Science Contributions Protocols and/or tools for use in dynamic, scalable, multi- institutional, computationally & data-rich settings for: l Large-scale distributed system architecture l Cross-org authentication l Scalable community-based policy enforcement l Robust & scalable discovery l Wide-area scheduling l High-performance, robust, wide-area data management l Knowledge-based workflow generation l High-end collaboration l Resource & service virtualization l Distributed monitoring & manageability l Application development l Wide area fault tolerance l Infrastructure deployment & management l Resource provisioning & quality of service l Performance monitoring & modeling Slide 16 Ive come across some interesting data, but I need to understand the nature of the corrections applied when it was constructed before I can trust it for my purposes. Virtual Data System TransformationDerivation Data created-by execution-of consumed-by/ generated-by Ive detected a calibration error in an instrument and want to know which derived data to recompute. I want to search an astronomical database for galaxies with certain characteristics. If a program that performs this analysis exists, I wont have to write one from scratch. I want to apply an astronomical analysis program to millions of objects. If the results already exist, Ill save weeks of computation. Collaborative Workflow: Virtual Data www.griphyn.org/chimera Slide 17 17 Adaptive Unstructured Multicast UMM: A dynamically adaptive, unstructured multicast overlay M. Ripeanu et al. A E B D C A E B D C A E B D C Application overlay Base overlay Physical topology 0 2 4 6 8 10 0 240480720960 12001440 1680 19202160 2400 26402880 3120336036003840 Time (sec) RDP 0 2 4 6 8 10 12 Maxximum link stress. MaxRDP 95% RDP 90%RDP Stress 10 nodes fail then rejoin 900s later RDP=1 RDP=2 Slide 18 18 Problem-Driven, Collaborative Research Methodology Design DeployBuild Apply Analyze Apply Deploy Apply Computer Science Software, Standards Discipline Advances Infra- structure Global Community Slide 19 19 Open Standards & Software l Standardized & interoperable mechanisms for secure & reliable: u Authentication, authorization, policy, u Representation & management of state u Initiation & management of computation u Data access & movement u Communication & notification l Good quality open source implementations to accelerate adoption & development u E.g., Globus Toolkit Slide 20 20 Increased functionality, standardization Custom solutions 1990199520002005 Open Grid Services Arch Real standards Multiple implementations Web services, etc. Managed shared virtual systems Research Globus Toolkit Defacto standard Single implementation Internet standards Evolution of Open Grid Standards and Software 2010 Slide 21 21 WS Core Enables Frameworks: E.g., Resource Management Web services (WSDL, SOAP, WS-Security, WS-ReliableMessaging, ) WS-Resource Framework & WS-Notification (Resource identity, lifetime, inspection, subscription, ) WS-Agreement (Agreement negotiation) WS Distributed Management (Lifecycle, monitoring, ) Applications of the framework (Compute, network, storage provisioning, job reservation & submission, data management, application service QoS, ) Slide 22 22 WSRF & WS-Notification l Naming and bindings (basis for virtualization) u Every resource can be uniquely referenced, and has one or more associated services for interacting with it l Lifecycle (basis for fault resilient state mgmt) u Resources created by services following factory pattern u Resources destroyed immediately or scheduled l Information model (basis for monitoring, discovery) u Resource properties associated with resources u Operations for querying and setting this info u Asynchronous notification of changes to properties l Service groups (basis for registries, collective svcs) u Group membership rules & membership management l Base Fault type Slide 23 23 Network R R R A Service Level Bringing it All Together Scenario: Resource management & scheduling Storage R R R Blades R R R Notification Grid Scheduler WS-Resource used to model physical processor resources WS-Resource Properties project processor status (like utilization) Local processor manager is front-ended with A Web service interface Other kinds of resources are also modeled as WS-Resources J J J WS-Notification can be used to inform the scheduler when processor utilization changes Grid Jobs and tasks are also modeled using WS-Resources and Resource Properties Grid Scheduler is a Web Service Service Level Agreement is modeled as a WS-Resource Lifetime of SLA Resource tied to the duration of the agreement Slide 24 24 The Globus Alliance & Toolkit (Argonne, USC/ISI, Edinburgh, PDC) l An international partnership dedicated to creating & disseminating high-quality open source Grid technology: the Globus Toolkit u Design, engineering, support, governance l Academic Affiliates make major contributions u EU: CERN, Imperial, MPI, Poznan u AP: AIST, TIT, Monash u US: NCSA, SDSC, TACC, UCSB, UW, etc. l Significant industrial contributions l 1000s of users worldwide, many contribute Slide 25 25 Globus Toolkit History: An Unreliable Memoir DARPA, NSF begin funding Grid work NASA initiates Information Power Grid Globus Project wins Global Information Infrastructure Award MPICH-G released The Grid: Blueprint for a New Computing Infrastructure published GT 1.0.0 Released Early Application Successes Reported GT 1.1.1 Released GT 1.1.2 Released GT 1.1.3 Released NSF & European Commission Initiate Many New Grid Projects GT 1.1.4 and MPICH-G2 Released Anatomy of the Grid Paper Released First EuroGlobus Conference Held in Lecce Significant Commercial Interest in Grids NSF GRIDS Center Initiated GT 2.0 beta Released Physiology of the Grid Paper Released GT 2.0 Released GT 2.2 Released Only Globus.Org; not downloads from: NMI UK eScience EU DataGrid IBM Platform etc. Slide 26 26 Globus Toolkit Contributors Include l Grid Packaging Technology (GPT) NCSA l Persistent GRAM Jobmanager Condor l GSI/Kerberos interchangeability Sandia l Documentation NASA, NCSA l Ports IBM, HP, Sun, SDSC, l MDS stress testing EU DataGrid l Support IBM, Platform, UK eScience l Testing and patches Many l Interoperable tools Many l Replica location service EU DataGrid l Python hosting environment LBNL l Data access & integration UK eScience l Data mediation services SDSC l Tooling, Xindice, JMS IBM l Brokering framework Platform l Management frameworkHP l $$ DARPA, DOE, NSF, NASA, Microsoft, EU Slide 27 27 GT-Based Grid Tools & Solutions Globus Toolkit Virtual Data Toolkit Platform GlobusNSF Middleware Init.Butterfly Grid EU DataGrid IBM Grid Toolbox MPICH-G2Access Grid Earth System GridFusion Grid BIRN Biomedical GridTeraGrid NEESgridUK eScience Grid Slide 28 28 Problem-Driven, Collaborative Research Methodology Design DeployBuild Apply Analyze Apply Deploy Apply Computer Science Software, Standards Discipline Advances Infra- structure Global Community Slide 29 29 Infrastructure l Broadly deployed services in support of virtual organization formation and operation u Authentication, authorization, discovery, l Services, software, and policies enabling on- demand access to important resources u Computers, databases, networks, storage, software services, l Operational support for 24x7 availability l Integration with campus infrastructures l Distributed, heterogeneous, instrumented systems can be wonderful CS testbeds Slide 30 30 Infrastructure Status l Many infrastructure deployments worldwide u Community-specific & general-purpose u From campus to international u Most based on GT technology l U.S. examples: TeraGrid, Grid2003, NEESgrid, Earth System Grid, BIRN l Major open issues include practical aspects of operations and federation l Scalability issues (number of users, sites, resources, files, jobs, etc.) also arising Slide 31 NSF Network for Earthquake Engineering Simulation (NEES) Transform our ability to carry out research vital to reducing vulnerability to catastrophic earthquakes Slide 32 32 NEESgrid User Perspective Secure, reliable, on- demand access to data, software, people, and other resources (ideally all via a Web Browser!) Slide 33 33 How it Really Happens (with the Globus Toolkit) Web Browser Compute Server Globus MCS/RLS Data Viewer Tool Certificate Authority CHEF Chat Teamlet MyProxy CHEF Compute Server Resources implement standard access & management interfaces Collective services aggregate &/or virtualize resources Users work with client applications Application services organize VOs & enable access to other services Database service Database service Database service Simulation Tool Camera Telepresence Monitor Globus Index Service Globus GRAM Globus DAI Application Developer 2 Off the Shelf 9 Globus Toolkit 4 Grid Community 4 Slide 34 34 Grid2003: An Operational Grid 28 sites (2100-2800 CPUs) & growing 400-1300 concurrent jobs 7 substantial applications + CS experiments Running since October 2003 Korea http://www.ivdgl.org/grid2003 Slide 35 35 Open Science Grid Components l Computers & storage at 28 sites (to date) u 2800+ CPUs l Uniform service environment at each site u Globus Toolkit provides basic authentication, execution management, data movement u Pacman installation system enables installation of numerous other VDT and application services l Global & virtual organization services u Certification & registration authorities, VO membership services, monitoring services l Client-side tools for data access & analysis u Virtual data, execution planning, DAG management, execution management, monitoring l IGOC: iVDGL Grid Operations Center Slide 36 36 www.earthsystemgrid.org DOE Earth System Grid Goal: address technical obstacles to the sharing & analysis of high-volume data from advanced earth system models Slide 37 37 Earth System Grid Slide 38 38 Problem-Driven, Collaborative Research Methodology Design DeployBuild Apply Analyze Apply Deploy Apply Computer Science Software, Standards Discipline Advances Infra- structure Global Community Slide 39 39 NCSA Computational Model All computational models written in Matlab. m1m1 f1f1 UIUC Experimental Model f1f1 m1m1 f2f2 f2f2 U. Colorado Experimental Model NEESgrid Multi-site Online Simulation Test Slide 40 40 NEESgrid Multisite Online Simulation Test (July 2003) Illinois Colorado Illinois (simulation) Slide 41 41 MOST: A Grid Perspective U. Colorado Experimental Model f2f2 m 1, 1 F2F2 F1F1 e = f 1, x 1 UIUC Experimental Model NTCPSERVER m1m1 f1f1 f2f2 NCSA Computational Model SIMULATIONCOORDINATOR NTCPSERVER NTCPSERVER Slide 42 42 Grid2003 Applications To Date l CMS proton-proton collision simulation l ATLAS proton-proton collision simulation l LIGO gravitational wave search l SDSS galaxy cluster detection l ATLAS interactive analysis l BTeV proton-antiproton collision simulation l SnB biomolecular analysis l GADU/Gnare genone analysis l Various computer science experiments www.ivdgl.org/grid2003/applications Slide 43 Example Grid2003 Workflows Genome sequence analysis Physics data analysis Sloan digital sky survey Slide 44 Example Grid3 Application: NVO Mosaic Construction NVO/NASA Montage: A small (1200 node) workflow Construct custom mosaics on demand from multiple data sources User specifies projection, coordinates, size, rotation, spatial sampling Work by Ewa Deelman et al., USC/ISI and Caltech Slide 45 45 Concluding Remarks Design DeployBuild Apply Analyze Apply Deploy Apply Computer Science Software, Standards Discipline Advances Infra- structure Global Community Slide 46 46 eScience & Grid: 6 Theses 1.Scientific progress depends increasingly on large-scale distributed collaborative work 2.Such distributed collaborative work raises challenging problems of broad importance 3.Any effective attack on those problems must involve close engagement with applications 4.Open software & standards are key to producing & disseminating required solutions 5.Shared software & service infrastructure are essential application enablers 6.A cross-disciplinary community of technology producers & consumers is needed Slide 47 Global Community Slide 48 48 (Based on a slide from HP) Utility Computing is One of Several Commercial Drivers shared, traded resources value clusters grid-enabled systems programmable data center virtual data center Open VMS clusters, TruCluster, MC ServiceGuard Tru64, HP-UX, Linux switch fabric computestorage UDC computing utility or GRID today l Utility computing l On-demand l Service-orientation l Virtualization Slide 49 49 Significant Challenges Remain l Scaling in multiple dimensions u Ambition and complexity of applications u Number of users, datasets, services, u From technologies to solutions l The need for persistent infrastructure u Software and people as well as hardware u Currently no long-term commitment l Institutionalizing multidisciplinary approach u Understand implications on the practice of computer science research Slide 50 50 Thanks, in particular, to: l Carl Kesselman and Steve Tuecke, my long- time Globus co-conspirators l Gregor von Laszewski, Kate Keahey, Jennifer Schopf, Mike Wilde, Argonne colleagues l Globus Alliance members at Argonne, U.Chicago, USC/ISI, Edinburgh, PDC l Miron Livny, U.Wisconsin Condor project, Rick Stevens, Argonne & U.Chicago l Other partners in Grid technology, application, & infrastructure projects l DOE, NSF, NASA, IBM for generous support Slide 51 51 For More Information l Globus Alliance u www.globus.org l Global Grid Forum u www.ggf.org l Open Science Grid u www.opensciencegrid.org l Background information u www.mcs.anl.gov/~foster l GlobusWORLD 2005 u Feb 7-11, Boston 2nd Edition www.mkp.com/grid2