grid computing - an overview michael p. cummings laboratory of molecular evolution center for...
Post on 21-Dec-2015
216 views
TRANSCRIPT
Grid Computing - An Overview
Michael P. Cummings
Laboratory of Molecular EvolutionCenter for Bioinformatics and Computational Biology
Acknowledgments
Core Middleware Development Adam Bazinet Daniel Myers John Fuetsch Stephen McLellan, Chris Milliron, Deji
Akinyemi
Semantic Web Grid Services/Workflows Sung Lee, Fujitsu Laboratories of America Nada Hashmi, UMIACS (now CBA, Saudi Arabia) David Wang, UMIACS
Outline
Grid computing introduction and motivation
Goals of The Lattice Project Basic architecture Our current production Grid
system Implementation details Results of usage
Research and development
Grid Computing: A Definition
A model of distributed computing that uses geographically and administratively disparate resources. In Grid computing, individual users can access computers and data transparently, without having to consider location, operating system, account administration, and other details. In Grid computing, the details are abstracted, and the resources are virtualized.
Grid Computing: Characteristics
Resources are heterogeneous Resources are administratively
disparate Resources are geographically
disparate Users do not have to worry about
system details (e.g., location, operating system, accounts)
Grid Computing: Advantages
Provides increased resources for research
Utilizes resources already purchased
Space and HVAC needs already met Little increased administrative
burden Economically and environmentally
appealing
Types of Grid Middleware
Heavyweight/feature rich (e.g., Globus Toolkit) Multiple users and multiple
applications Mechanisms for authentication,
authorization, communication, file access, resource discovery and specification
Push model: jobs are assigned to specific resources
Types of Grid Middleware
Desktop Grids (e.g., Berkeley Open Infrastructure for Network computing [BOINC]) Single user and single application Limited features Pull model: clients contact server
for jobs
An Example: SETI@home
A scientific experiment that uses Internet-connected computers in the Search for Extraterrestrial Intelligence (SETI), a scientific effort seeking to determine if there is intelligent life outside Earth. The project analyzes radio signals to look for patterns that might be associated with intelligent life.
SETI@home statistics (Monday)
Total participants: 5,521,708 Rate of signup: a new participant
every 96 seconds Effective number of computers: At
any given moment there are the equivalent of >412,000 computers working full time
Results received: 2,200,991,756 Total CPU time: 2,555,681 years
Why Go Grid?
To speed research Parallel execution means higher throughput
To make compute resources commodities Analogous to the electrical power grid
To foster efficiency and interaction in the research community Use of the Grid spans departments and domains Grid resources are typically shared resources
Outline
Grid computing introduction and motivation
Goals of The Lattice Project Basic architecture Our current production Grid
system Implementation details Results of usage
Research and development
The Lattice Project: Initial Goals
Develop a Grid system for research that: Speeds up workflows by “Grid-enabling”
various programs Is simple and intuitive Takes advantage of heterogeneous resources Is capable of managing large numbers of
jobs (thousands) Supports multiple users and lowers the
barriers to getting involved Is community-driven and supported
Principles of Design
Make use of well supported open source software Globus Toolkit BOINC Condor
Engineered software should be scalable, modular, and robust
Expose programs as well-defined services Arbitrary user-supplied code cannot be run
Grid: Development Challenges
Many middleware systems are not compatible
Middleware is cumbersome Developing a Grid service is
often difficult
Outline
Grid computing introduction and motivation
Goals of The Lattice Project Basic architecture Our current production Grid
system Implementation details Usage statistics
Research and development
Terminology
Client: A Grid user interface OR a machine that performs computation
Grid Service: A Grid-enabled program
Scheduler: A program that decides where Grid jobs will run
Resource: Executes Grid jobs
Basic Architecture (1 of 3)
Basic Architecture (2 of 3)
Basic Architecture (3 of 3)
Outline
Grid computing introduction and motivation
Goals of The Lattice Project Basic architecture Our current production Grid
system Implementation details Results of usage
Research and development
Software Components Globus Toolkit version 3.2.1
Backbone of the Grid http://www.globus.org/
Condor-G Grid-level scheduler / resource broker http://www.cs.wisc.edu/condor/
BOINC: Berkeley Open Infrastructure for Network Computing SETI@home-style desktop grid http://boinc.berkeley.edu/
Custom components GSBL, GSG, Globus-BOINC adaptor, MDS-
matchmaking bridge, user interface(s), administrative scripts, and much more
Globus Toolkit 3
Key components: Globus Core
Grid service hosting environment GSI – Grid Security Infrastructure
Uses public key cryptography Secures communication Authenticates and authorizes Grid users
WS GRAM – Job management GASS – Point to point file transfer MDS2 – Information provider
Condor-G
Condor-G is part of the Condor suite
Resources and jobs send Condor-G descriptions of themselves called ClassAds
Condor-G matches Grid jobs to suitable resources, then submits and manages them
This process is called matchmaking
BOINC
Most novel feature of our Grid Public computing model
Untrusted resources
Potentially our largest resource
We have targeted 3 platforms: Windows / Linux x86 / Mac OS X
Our Current Grid System
User Interface The “Grid Brick”: a machine used to submit Grid
jobs Our primary interface for Grid users Command line clients mimic normal program
execution Lattice Intranet
Provides instructions for submitting jobs and managing data input and output
Provides tools for describing and monitoring jobs
Other possibilities: Web portal model of job submission A client capable of composing complex workflows
using Task Computing and Semantic Web technology developed by collaborators at Fujitsu
Basic Architecture – Client/Service
Grid Client Stack
lattice_submit / lattice_retrieve
Service-specific* submit / retrieve scripts
Client.pm – base Perl module
Service-specific* submit / retrieve classes
GSBL – Grid Service Base Library
Globus API
Command-line Interface Perl Java
* Service-specific templates and stubs are created by the Grid Service Generator
Grid Service Stack
Service-specific* Implementation
GSBL – Grid Service Base Library
Globus API
Grid Service Hosting Environment, a.k.a. “the container” Java
* Service-specific templates and stubs are created by the Grid Service Generator
Tools for Writing Grid Services
Grid Service Base Library (GSBL) Java API for building Grid services with the
Globus Toolkit Shields programmers from having to work with
the Globus API directly Provides a high-level interface for
operations such as job submission and file transfer
Grid Service Generator (GSG) Simplifies the process of creating Grid
Services Intended for use with GSBL
GSBL: Design and Features Classes for:
Clients and services (base classes)
Argument description and processing
File transfers Job submission and
control Security
configuration Java synchronization
and Globus notifications to paper over event-based model
ClientApplication
(e.g., BLAST)Application
(e.g., BLAST)
GSBL GSBL
Globus API Globus API
Service
Grid Service Generator
Deploying a Grid service with Globus is absurdly complicated Many files, namespaces: lots of
potential typos GSG takes as input a few
parameters (service name, location, an XML argument description, etc.) and generates all requisite configuration files and skeleton Java classes
Grid Services
Application
Condor (Linux/UNIX)
BOINC
Linux X86 Win32 Mac OS X
BLAST1 Yes No No No
Clustal W Yes Yes Yes Yes
CNS Yes Yes Yes No
Lamarc Yes Yes Yes Yes
MDIV Yes Yes Yes Yes
Migrate-N Yes Yes Yes Yes
Modeltest Yes Yes Yes Yes
MrBayes Yes Yes Yes Yes
ms Yes Yes Yes Yes
Muscle Yes Yes Yes Yes
PAUP*2 Yes No No No
Phyml Yes Yes Yes Yes
Pknots Yes Yes Yes Yes
Seq-gen Yes Yes Yes Yes
Snn Yes Yes Yes Yes
ssearch Yes Yes Yes Yes
Structure3 Yes No No No
Grid Services Creating Grid Services requires:
Knowledge of the application Techniques for compiling and porting the
application to various platforms Knowledge of the infrastructure so it can
be effectively tested and deployed Challenges:
Maintaining bodies of Grid Service code as the number of applications grow and new versions of applications are released
Minimizing the number of updates that need to be applied when the framework changes
Basic Architecture - Scheduling
Condor-G: ClassAds
Resources and jobs send Condor-G descriptions of themselves called ClassAds Jobs require certain capabilities
of resources Resources advertise their
capabilities Similar to a dating service: central
broker points pairs of compatible jobs/resources at each other
Condor G: ClassAds
Condor Collector
Resource A Resource B Resource C
I haveMrBayes!
I haveSSEARCH!
I havePAUP*!
Condor user
I need MrBayes!
Resource CCondor user
I hear you haveMrBayes?
Well, let's talkabout that...
Generating ClassAds
Job ClassAds are generated by the Condor-G job manager Job requirements are specified in the
Grid service configuration files
Resource ClassAds are generated by extracting information from MDS Lattice information providers supply
data required for matchmaking
Monitoring and Discovery System (MDS2)
Globus information services component LDAP-based (new version XML-based)
Answers questions like: What resources are available? What capabilities do these resources
have? What is the load on these resources?
This in turn allows for intelligent decisions to be made in areas such as scheduling and resource accounting
Basic Architecture - Resources
Current Grid Resources
http://lattice.umiacs.umd.edu/resources/
UMIACS Condor pool > 400 processors
BOINC pools Clients on campus > 100 Public (off-campus) clients > 1000
BOINC Works on the “pull” model, that is:
One or more servers create workunits Clients connect asynchronously, pull down
work, and return the results Clients are relatively lightweight and
easy to install and manage One client can process work for
multiple projects Participants can join teams and are
given credit for the work they complete http://lattice.umiacs.umd.edu/
boinc_public
Globus-BOINC Adapter
Consists of a number of components that allow us to run Grid Services on BOINC BOINC job manager Custom validator and assimilator
Registers BOINC with Globus as a GRAM-addressable resource
BOINC compatibility library eases the process of porting applications to BOINC
Research Projects Using the Grid
The Laboratory of David Fushman has run protein-protein docking algorithms on Lattice CNS is the primary Grid service in this
project Floyd Reed and Holly Mortensen from the
Laboratory of Sarah Tishkoff have run a number of population genetics analyses MDIV and IM are the primary Grid services
The Laboratory of Molecular Evolution has run statistical phylogenetic analyses GSI is the primary Grid service
Results of Grid Usage
IM – 0.13 CPU years (BOINC) MDIV – 4.93 CPU years (BOINC) CNS – 12.4 CPU years (BOINC) GSI – 94.05 CPU years (Condor)
Total: 111.51 CPU years BOINC participants in 21
countries
Outline
Grid computing introduction and motivation
Goals of The Lattice Project Basic architecture Our current production Grid
system Implementation details Results of usage
Research and development
GT4 Research and Development We are currently upgrading the Grid system to
use Globus Toolkit 4.0 GT4 adheres strictly to emerging and
established Web service standards Actively developed and supported Many components have been greatly improved
GridFTP/RFT (will replace GASS) WS GRAM MDS4 (XML based; replaces MDS2, LDAP based)
Our basic architecture remains the same, and the upgrade has been made easier because of tools we have already developed (GSBL, GSG)
More Information
Lattice Website http://lattice.umiacs.umd.edu/