overview of grid computing
DESCRIPTION
Overview of Grid Computing. J. Charles Kesler MCNC. Overview. Introduction: Why Grids? Applications for Grids Basic Grid Architecture Grid Platforms Market Segments Examples: Globus, OGSA, AVAKI Building a Grid Project Manager’s View System Administrator’s View - PowerPoint PPT PresentationTRANSCRIPT
April 2003 1
Overview of Grid Computing
J. Charles KeslerMCNC
April 2003 2
Overview
Introduction: Why Grids?Applications for GridsBasic Grid ArchitectureGrid Platforms
Market Segments Examples: Globus, OGSA, AVAKI
Building a Grid Project Manager’s View System Administrator’s View Example: The North Carolina BioGrid Project
Grid Reference Resources
April 2003 3
Why Grids? From the Viewpoint of Research Computing
Researchers are buying clusters A cluster for every researcher in many cases Of course, a cluster comes with a non-trivial amount of
storageComputational power is like commodity Internet
bandwidth – all readily available capacity will be consumed
But, there is a lot of capacity sitting idle in these cluster islands across organizations
Maintenance of clusters is often done inefficiently
…by someone who would prefer to be doing something other than systems administration
April 2003 4
Current State of Research Computing
Researchers are asking IT to… Host and/or administer compute clusters Host applications and datasets Provide update and backup utilities for datasets Optimize and/or port applications Provide a front end for simplified access to resources Provide tools for workflow automation
That is, IT could benefit from a "utility computing" model to deliver services to researchers
April 2003 5
Collaboration in the Research Community
Researchers at multiple universities are often working together on the same grants, so they need to share:
Hardware resources Applications Data sets Results
This sharing has to happen across multiple, mutually distrustful administrative domains
The buzzword: Virtual Organization (“VO”)
April 2003 6
Grid Computing’s Potential for Research
Virtual Computers
Virtual DatabasesUNC-CH
NCSUDuke
WFU
WSSUNCArts
NCAT
UNC-C
UNC-A
ECSU
WCU
ASU
ECU
UNC-G
NCCU
UNC-W
UNC-P
FSU
Unified view of data and computers Computers and data appear to be local
Efficient access to large data sets Caching Replication
Attributes Single sign-on,
security Policy-based
resource sharing
April 2003 7
Grids According to the Experts
“Flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources.”
“A grid is all about gathering together resources and making them accessible to users and applications.”
From The Anatomy of the Grid by Foster, Kesselman and Tuecke
Dr. Andrew Grimshaw, CTO Avaki
April 2003 8
Grids Are By Definition Heterogeneous
It’s about legacy resources, infrastructure, applications, policies, and procedures
The grid and its administrators must integrate in stealth mode…with
Firewalls Filesystems Queuing systems Grumpy systems administrators Tried and true applications
April 2003 9
What It Means To…
The end user: Can transparently access resources in multiple VO’s Can more easily collaborate with other researchers
The IT administrator: Has a secure framework for implementing distributed
resource sharing Local resource administrators can control access to
their resourcesThe manager:
Sees better utilization of capital resources Has a tool that helps break down organizational
barriers
April 2003 10
Challenges in Grid Computing
Reliable performanceTrust relationships between multiple security
domainsDeployment and maintenance of grid middleware
across hundreds or thousands of nodesAccess to data across WAN’sAccess to state information of remote processesWorkflow / dependency managementDistributed software and license managementAccounting and billing
April 2003 11
Overview
Introduction: Why Grids?Applications for GridsBasic Grid ArchitectureGrid Platforms
Market Segments Examples: Globus, OGSA, AVAKI
Building a Grid Project Manager’s View System Administrator’s View Example: The North Carolina BioGrid Project
Grid Reference Resources
April 2003 12
Applications for a Grid
Generally, apps that work well on clusters can work well on grids
Non-interactive / batch jobsParallel computations with minimal
interprocess communication and workflow dependencies
Reasonable data transfer requirementsSensible economics
April 2003 13
Non-Interactive / Batch Jobs
Difficult to get a real-time UI for jobs running on the grid
A possible interactive application: spreadsheet computation
Want to take advantage of off-peak free cycles Jobs run for several days, weeks or months The user might prefer to be sleeping while the job runs!
Running processes might need to be interrupted or re-prioritized based on the current load on a grid compute engine
Idle thread / “screensaver” computing
April 2003 14
Parallel Computations
Application needs to be able to run as multiple, mostly independent pieces
Good Example: Parameter space study Thousands++ of input files Processed independently by the same application Output file generated for each run (corresponding to
an input file) Analysis of the results reported in the output files to
find the optimal solution Need to build workflow management and results
analysis tools around the grid-based components
April 2003 15
Minimal Interprocess Communications and Dependencies
Can’t depend on the network’s QoSCan’t rely upon the order of execution and
completionApps that need these things are better suited
for tightly coupled compute platforms (e.g. SMP systems)
Grid can still be useful as a meta-scheduler and data source for such apps
e.g. the user submits the job to the grid queue and asks for the best available SMP resource
April 2003 16
Reasonable Data Transfer Requirements
It is usually necessary to “stage” files and executables as part of running a grid job
Data transfer time should be small relative to each component job’s run time
Solution: Caching and replication -- but these are not perfect and can be non-trivial to implement
Another solution: schedule the job where the data is (instead of bringing the data to the job)
Might be required if the data is only licensed for some nodes
But, if instead the application is only licensed to run on particular nodes, then the data has to be brought to where the application is
April 2003 17
The Bottom Line: Sensible Economics
To Grid or Not To Grid:
Productivity Gains > Cost of Building Grid + Opportunity Costs of Resources
April 2003 18
Some Costs and Benefits
Costs: Grid Middleware Architects and
Developers User Training Infrastructure Hardware Opportunity Costs
Would a big SMP box return better results for your problem?
Benefits: Better Utilization of
Existing Capital Resources
More Efficient Users Ability to complete more
work in the same amount of time
Performance near or sometimes as good as the big SMP box
April 2003 19
Overview
Introduction: Why Grids?Applications for GridsBasic Grid ArchitectureGrid Platforms
Market Segments Examples: Globus, OGSA, AVAKI
Building a Grid Project Manager’s View System Administrator’s View Example: The North Carolina BioGrid Project
Grid Reference Resources
April 2003 20
The Single System Model
User Interface / API
ResourceDiscovery
ProcessManagement
AuthenticationAuthorizationAccounting
MessagePassing
DataManagement
Operating System
Storage Compute
April 2003 21
What Makes a Cluster a Cluster?
Uses a Distributed Resource Manager (DRM) to manager job scheduling
Tightly coupled - High speed, low latency interconnect network
Shared storage for home directories, high throughput scratch space, applications
Fairly homogenous - Configuration management is important!
Single administrative domainUser accounts managed with traditional
mechanisms
April 2003 22
The Cluster Model
RD PM3A DMMP
Operating System
StorageCompute
Cluster DRM
RD PM3A DMMP
Operating System
StorageCompute
Cluster DRM
RD PM3A DMMP
Operating System
StorageCompute
Cluster DRM
RD PM3A DMMP
Operating System
StorageCompute
Cluster DRM
RD PM3A DMMP
User Interface/API
Cluster DRM
Cluster Node Cluster Node Cluster Node Cluster Node
High SpeedInterconnect
Master Node
SharedStorage
ConfigurationManagement
April 2003 23
How is an Enterprise Grid Different from a Cluster?
Heterogeneous - Clusters, SMP, even workstations of dissimilar configurations, but all are tied together through a grid middleware layer
Lightly coupled - Connected via 100 or 1000Mbps Ethernet
Introduces a resource registry and grid security service
But usually only a single registry and security service for the grid
Not necessarily a single administrative domain
April 2003 24
The Enterprise Grid Model
RD PMAA DMMP
Operating System
StorageCompute
Cluster InterfaceRD PMAA DMMP
Operating System
StorageCompute
Cluster InterfaceRD PMAA DMMP
Operating System
StorageCompute
Cluster InterfaceRD PM3A DMMP
Operating System
StorageCompute
Grid Interface
RD PM3A DMMP
Operating System
StorageCompute
Grid Interface
RD PM3A DMMP
User Interface/API
Grid Interface
SMP SMP
EnterpriseLAN or WAN
SecurityInfrastructure
ResourceRegistry
Grid Interface
Cluster DRM RD PMAA DMMP
Operating System
StorageCompute
Cluster InterfaceRD PMAA DMMP
Operating System
StorageCompute
Cluster InterfaceRD PMAA DMMP
Operating System
StorageCompute
Cluster InterfaceGrid Interface
Cluster DRM
RD PM3A DMMP RD PM3A DMMP
April 2003 25
How is a Global Grid Different from an Enterprise Grid?
"Grid of Grids" - Collection of enterprise gridsLoosely coupled between sites - Not much
control over QoS*Mutually distrustful administrative domainsMultiple grid resource registries and grid
security services
*Not true for grids in the NCREN network!
April 2003 26
The Global Grid Model
Grid
WAN
RR SI
Cluster
Grid
SMP
Grid
SMP
Grid
Cluster
UI/API
Grid
LAN
Grid
RR SI
SMP
Grid
SMP
Grid
SMP
Grid
Cluster
Cluster
RR SI
ClusterSMP
Grid
Cluster
Grid Grid Grid
LAN
Site A
Site B
Site C
UI/API
Grid
UI/API
Grid
LAN
April 2003 27
Overview
Introduction: Why Grids?Applications for GridsBasic Grid ArchitectureGrid Platforms
Market Segments Examples: Globus, OGSA, AVAKI
Building a Grid Project Manager’s View System Administrator’s View Example: The North Carolina BioGrid Project
Grid Reference Resources
April 2003 28
Grid Platforms -- Market Segments
One Way to Categorize Grids:Toolkits Integrated Environments
Or Another Way to Look at Grids:Server AggregationDesktop Aggregation
April 2003 29
Where Platforms Fit in the Market
Desktop Aggregation Server Aggregation
Toolk
its
Inte
gra
ted
En
viro
nm
ents
• Globus
• OGSA
• Avaki
• United Devices
• Data Synapse
• Entropia
• Parabon
• NMI
• IBM Grid Toolbox
• Platform LSFMulti-Cluster
• BOINC
April 2003 30
The Early Adopter Market for Grid Technology
Private SectorPharmaceuticals
Banking & FinanceEnergy
(does anyone want this?)
Mix of Industryand AcademiaLife Sciences
Entertainment
Public SectorAcademia
GovernmentNational Labs
Desktop Aggregation Server Aggregation
Toolk
its
Inte
gra
ted
En
viro
nm
ents
April 2003 31
Grid Platform Example: Globus Toolkit V2
Primary development occurred at Argonne National Labs
Principals were Ian Foster and Carl Kesselman
Open source But architecture development was a closed process
Toolkit approach: different “bundles” that can be installed depending upon what functions are desired
API through CoG (Commodity Grid) kits Java, Python, CORBA, Perl, Matlab, Web services, JSP
April 2003 32
Globus Toolkit V2
Majority of its use is in university and government research environments
Some vendors offer value-added versions IBM Grid Toolbox Platform Globus
NSF Middleware Initiative (NMI) is packaging pre-built Globus with other relevant components
NWS (Network Weather Service) KX.509/KCA (Kerberos-X.509 integration) Condor-G as a “metascheduler” GSI-enabled OpenSSH
April 2003 33
Globus Toolkit V2 “Pillars”
InformationServices(MDS)
DataManagement
(GASS)
ResourceManagement
(GRAM)
Grid Security Infrastructure(GSI)
April 2003 34
Globus Toolkit V2 Stack
MDS GASS/GridFTPGRAM
GSI
HTTP LDAP FTP
TLS/SSL
TCP/IP
April 2003 35
Globus Toolkit V2 Key Components:GRAM, MDS and GASS
Grid Resource Allocation Manager (GRAM) Server-side: “gatekeeper” process that controls
execution of job managers Client-side: “globusrun” UI to launch jobs
Monitoring and Directory Service (MDS) GRIS: Grid Resource Information Service collects local
info GIIS: Grid Index Information Service collects GRIS info
Global Access to Secondary Storage (GASS) GridFTP, implemented through “in.ftpd” daemon and
“globus-url-copy” command Files accessed through a URI, e.g.
gsiftp://node1.ncbiogrid.org/data/ncbi/ecoli.nt
April 2003 36
Globus Toolkit V2 Key Components:GSI
Uses a TLS/SSL-based PKI infrastructureAll server resources (i.e. gatekeeper, GRIS) and
users have a public key that has been digitally signed by the CA (the “certificate”) and a private key
“grid-cert-request” to generate key pair User/sysadmin sends the public key to CA CA signs the public key with its private key and returns
to the signed certificate to the user/sysadmin The user/sysadmin stores the signed certificate in the
local filesystem Certificate contains: the subject name, the subject’s
public key, the CA’s name, and the CA’s signature
April 2003 37
Globus Toolkit V2 Key Components:GSI
Logging in to the grid (“grid-proxy-init”): User creates a temporary public-private key pair User’s private key is used to digitally sign the temporary
public key -- this becomes the “proxy” certificate This creates a chain of trust from the CA to the user to
the proxy certificate The proxy certificate and associated private key are
transmitted with a job
The proxy certificate can be used to issue commands on remote servers on the user’s behalf (“delegation”)
On remote servers, there is a “grid-mapfile” that maps user cert subject names to local userids
April 2003 38
Globus Toolkit V2 Additional Components
Grid Packaging Tools (GPT) Used to build (“gpt-build”), install (“gpt-install”) and
localize (“gpt-postinstall”) Globus components
MPICH-G2 A Globus V2 enabled version of MPI (Message
Passing Interface) Based on MPICH Utilizes GSI, MDS and GRAM
April 2003 39
Globus Toolkit V2 Network Services
CertificateAuthority
GIISServer
GRIS
gatekeeper
in.ftpd
Grid Node
GRAMClient
Client Node
GRIS
gatekeeper
in.ftpd
Grid Node
GRIS
gatekeeper
in.ftpd
Grid Node
GRIS
gatekeeper
in.ftpd
Grid Node
Network
April 2003 40
GRAM, MDS and GASS Interactions
resourceresourceprocessprocess
job manager
gatekeeper
process
GRAM
GRIS
resource
GIIS
MDS
GridFTPin.ftpd
GASS
job allocationjob management
resourcediscovery
data transferdata control
user / proxy
Client
RSL/DUROC/HTTP 1.1 LDAP LDAP
LDAP LDAP
gsiftp
April 2003 41
Globus Toolkit V2 Strengths and Weaknesses
Strengths: Mindshare and
collaboration in both industry & academia
Open source Standards-based
underpinnings (e.g. SSL, LDAP)
Flexibility and CoG API's Driving OGSA with heavy
resource commitment from IBM
Weaknesses: Significant effort required
to get applications working on a grid
Not production quality at this time
No “metascheduler” -- user has to explicitly tell their jobs where to run
April 2003 42
Grid Platform Example: OGSA
Merging Grid and Web Services technologiesDeveloping open standards for grid computing
Sponsored by the GGF (organization modeled after IETF) Primary working groups: OGSA and OGSI Many vendors involved: IBM, Sun, Oracle, AVAKI, UD,
etc… (But, ANL and IBM seem to have the upper hand)
Working with the W3C to extend web services
Still in alpha / early beta formWill be open source and commercial
implementations Open source: Globus 3. Commercial: IBM (Websphere), AVAKI, UD, etc…
April 2003 43
Some Key OGSA Concepts
Grid Service Handle (GSH) GSH is a globally unique name assigned to every
resource Does not contain any protocol or instance specific
information such as network address
Grid Service Reference (GSR) Contains the instance-specific information (e.g.
network address) Only valid for a limited lifespan
Factory Creates and manages grid services per user request Returns the GSH and GSR for a new instance
April 2003 44
OGSA / Globus 3.0 Preview Release
Implementation of the Grid Service Specification
Built on top of Apache Axis and Java CoGBased in J2EE environment, Limited .NET and C
support at this pointGlobus Toolkit 3.0 expected release
Alpha - Jan 13, 2003 @ GlobusWorld Final – June 2003
April 2003 45
OGSA / Globus 3.0 Stack
MDS GASS/GridFTPGRAM
Grid Services Abstraction
TCP/IP
SOAP + GSITLS/SSL
Other Transports
April 2003 46
OGSA Example
Registry
MappingService
ApplicationFactoryService
ApplicationServiceInstance
AuthFactoryService
AuthServiceInstance
User A
Request to CreateAuth Service
Request toAuth User
User B
User AuthInfo
GSH
GSR
April 2003 47
Grid Platform Example: AVAKI
Original technology came from the Legion project at UVa (which was also used as part of NPACI); principal is Andrew Grimshaw (now CTO)
Integrated solution - load and runObject-oriented architectureData Grid (v3.0) - new architecture meant as the
stepping stone to OGSA; implemented with J2EECompute Grid (v2.6) - latest release of Legion-based
technology; has compute and data grid integratedComprehensive Grid: 3.0 Data + 2.6 Compute Grids
April 2003 48
AVAKI 3.0 Data Grid ArchitectureAvaki
DomainController
LDAP(User Info)
AVAKIDomain
Controller
Grid Server(metadata)
Grid Server(metadata)
Data AccessServer(NFS)
ShareServer
ShareServer
ShareServer
ShareServer
/dmf/edu /local/data /home/edu /local/data
/grid/grid/dmf/edu/grid/home/edu/grid/data/grid/data/ncbi/grid/data/riceblast
/dmf/edu /data/ncbi /home/edu/data/riceblast
Othergrids
interconnect
April 2003 49
AVAKI Strengths and Weaknesses
Strengths Vendor support Easy to deploy Data grid Comprehensiveness Works through firewalls
(w/ its Proxy server) Moving towards OGSA
Weaknesses Vendor is a relatively
small company Doesn't have significant
mindshare Currently does not
publish its API's
April 2003 50
Overview
Introduction: Why Grids?Applications for GridsBasic Grid ArchitectureGrid Platforms
Market Segments Examples: Globus, OGSA, AVAKI
Building a Grid Project Manager’s View System Administrator’s View Example: The North Carolina BioGrid TestBed
Grid Reference Resources
April 2003 51
Building a Grid -The Project Manager’s View
Keys to success: Realize that grids are built, not bought! Early identification of business drivers and potential
applications for the grid project Have a brainstorming session with stakeholders (e.g.
power users, sys admins, managers)
Doing these things should help you quickly identify:
Is there a good business case for building a grid? What’s the right kind of grid to build?
Desktop or Server Aggregation? Integrated or Toolkit?
April 2003 52
Building a Grid -The Project Manager’s View
Use a Lifecycle Project Model, e.g. Requirements: identify apps, users and their needs Initial Planning: scope out hardware, middleware Prototype: build a testbed Review results with stakeholders Final Planning: gap analysis for production
implementation Deploy: purchase & install hw, sw; training for users Maintain: break-fix, identify and gridify other apps (Iterate!)
April 2003 53
Building a Grid -The Systems Administrator’s View
Establish installation and operational standardsEstablish security infrastructure to manage
grid identitiesEstablish resource registry infrastructure Install grid middleware and configure for
appropriate services, e.g. Compute engines Data sources
Establish grid identities for services and usersWork with users to gridify their applications
April 2003 54
Building a Grid - Example:The North Carolina BioGrid Testbed
Objective was to develop testbed environment to serve as:
A staging area for the production NC BioGrid A research platform for Grid researchers An interoperability testbed for the computing hardware,
middleware, and application software vendor communityTestbed representative of production
environment Hardware and software platforms User client platforms Location dynamics
Testbed needs to be persistent
April 2003 55
NC BioGrid Key Decisions
Focus on data grid The best way to deploy a petabyte of storage for bio
applications is to aggregate existing pools of storage (no one has $50M to $80M to spend on storage!)
But is a data grid useful without a compute grid? Probably not
Focus on server aggregation Although there are a lot of idle UNIX workstations
and PC’s on the campus, desktop aggregation is a problem we will look at later
Not picking a horse (yet) on Grid middleware Testing AVAKI and Globus
April 2003 56
NC BioGrid Testbed(Phase 1)
IBMLTO Library
Sun T3
IBM p690
SunFire 3800
FC Switch
FC
IBM eServer 1300
Development& Staging
ClientWorkstation
LAN
10/100
NCSC /RTP
SunFire V880
Gig-EClient
Workstation
CampusNet
IBM eServer 1300
Gig-EClient
Workstation
CampusNet
IBM eServer 1300
Gig-EClient
Workstation
CampusNet
NCREN(OC-48)
NC State / Raleigh
UNC / Chapel Hill
Duke / Durham
Gig-E
April 2003 57
Site Connection & Data TransportNorth Carolina Research & Education Network
Charlotte
Pembroke
NCSU
NCSUCentennialCampus
NCCUDuke
UNC-CH
Wilmington
ElizabethCity
Asheville
Cullowhee
Greenville
MCNC
Boone
MoreheadCity
Rocky Mount
Qwest
RTP rPoP
NCREN3 High bandwidth (OC-3, OC-12, OC-48) High reliability (multiple paths to rPoPs) Very resilient (all new equipment)
Abilene (OC-48)
Fayetteville
Greensboro
RTP
WinstonSalem
April 2003 59
Overview
Introduction: Why Grids?Applications for GridsBasic Grid ArchitectureGrid Platforms
Market Segments Examples: Globus, OGSA, AVAKI
Building a Grid Project Manager’s View System Administrator’s View Example: The North Carolina BioGrid Project
Grid Reference Resources
April 2003 60
Some Selected Grid Reference ResourcesNC BioGrid: http://www.ncbiogrid.org/
Also: http://www.ncbiogrid.org/resources/grid/index.html
The Global Grid Forum http://www.gridforum.org/
AVAKI http://www.avaki.com/
The Globus Project http://www.globus.org/
IBM RedBook on Globus Computing http://www.redbooks.ibm.com/pubs/pdfs/redbooks/
sg246895.pdf
NSF Middleware Initiative http://www.nsf-middleware.org/
April 2003 61
Overview of Grid Computing
Questions?