grid computing - an overview michael p. cummings laboratory of molecular evolution center for...

Grid Computing - An Overview

Michael P. Cummings

Laboratory of Molecular EvolutionCenter for Bioinformatics and Computational Biology

Acknowledgments

Core Middleware Development Adam Bazinet Daniel Myers John Fuetsch Stephen McLellan, Chris Milliron, Deji

Akinyemi

Semantic Web Grid Services/Workflows Sung Lee, Fujitsu Laboratories of America Nada Hashmi, UMIACS (now CBA, Saudi Arabia) David Wang, UMIACS

Outline

Grid computing introduction and motivation

Goals of The Lattice Project Basic architecture Our current production Grid

system Implementation details Results of usage

Research and development

Grid Computing: A Definition

A model of distributed computing that uses geographically and administratively disparate resources. In Grid computing, individual users can access computers and data transparently, without having to consider location, operating system, account administration, and other details. In Grid computing, the details are abstracted, and the resources are virtualized.

Grid Computing: Characteristics

Resources are heterogeneous Resources are administratively

disparate Resources are geographically

disparate Users do not have to worry about

system details (e.g., location, operating system, accounts)

Grid Computing: Advantages

Provides increased resources for research

Utilizes resources already purchased

Space and HVAC needs already met Little increased administrative

burden Economically and environmentally

appealing

Types of Grid Middleware

Heavyweight/feature rich (e.g., Globus Toolkit) Multiple users and multiple

applications Mechanisms for authentication,

authorization, communication, file access, resource discovery and specification

Push model: jobs are assigned to specific resources

Types of Grid Middleware

Desktop Grids (e.g., Berkeley Open Infrastructure for Network computing [BOINC]) Single user and single application Limited features Pull model: clients contact server

for jobs

An Example: SETI@home

A scientific experiment that uses Internet-connected computers in the Search for Extraterrestrial Intelligence (SETI), a scientific effort seeking to determine if there is intelligent life outside Earth. The project analyzes radio signals to look for patterns that might be associated with intelligent life.

SETI@home statistics (Monday)

Total participants: 5,521,708 Rate of signup: a new participant

every 96 seconds Effective number of computers: At

any given moment there are the equivalent of >412,000 computers working full time

Results received: 2,200,991,756 Total CPU time: 2,555,681 years

Why Go Grid?

To speed research Parallel execution means higher throughput

To make compute resources commodities Analogous to the electrical power grid

To foster efficiency and interaction in the research community Use of the Grid spans departments and domains Grid resources are typically shared resources

Outline





The Lattice Project: Initial Goals

Develop a Grid system for research that: Speeds up workflows by “Grid-enabling”

various programs Is simple and intuitive Takes advantage of heterogeneous resources Is capable of managing large numbers of

jobs (thousands) Supports multiple users and lowers the

barriers to getting involved Is community-driven and supported

Principles of Design

Make use of well supported open source software Globus Toolkit BOINC Condor

Engineered software should be scalable, modular, and robust

Expose programs as well-defined services Arbitrary user-supplied code cannot be run

Grid: Development Challenges

Many middleware systems are not compatible

Middleware is cumbersome Developing a Grid service is

often difficult

Outline



system Implementation details Usage statistics


Terminology

Client: A Grid user interface OR a machine that performs computation

Grid Service: A Grid-enabled program

Scheduler: A program that decides where Grid jobs will run

Resource: Executes Grid jobs

Basic Architecture (1 of 3)

Outline





Software Components Globus Toolkit version 3.2.1

Backbone of the Grid http://www.globus.org/

Condor-G Grid-level scheduler / resource broker http://www.cs.wisc.edu/condor/

BOINC: Berkeley Open Infrastructure for Network Computing SETI@home-style desktop grid http://boinc.berkeley.edu/

Custom components GSBL, GSG, Globus-BOINC adaptor, MDS-

matchmaking bridge, user interface(s), administrative scripts, and much more

Globus Toolkit 3

Key components: Globus Core

Grid service hosting environment GSI – Grid Security Infrastructure

Uses public key cryptography Secures communication Authenticates and authorizes Grid users

WS GRAM – Job management GASS – Point to point file transfer MDS2 – Information provider

Condor-G

Condor-G is part of the Condor suite

Resources and jobs send Condor-G descriptions of themselves called ClassAds

Condor-G matches Grid jobs to suitable resources, then submits and manages them

This process is called matchmaking

BOINC

Most novel feature of our Grid Public computing model

Untrusted resources

Potentially our largest resource

We have targeted 3 platforms: Windows / Linux x86 / Mac OS X

Our Current Grid System

User Interface The “Grid Brick”: a machine used to submit Grid

jobs Our primary interface for Grid users Command line clients mimic normal program

execution Lattice Intranet

Provides instructions for submitting jobs and managing data input and output

Provides tools for describing and monitoring jobs

Other possibilities: Web portal model of job submission A client capable of composing complex workflows

using Task Computing and Semantic Web technology developed by collaborators at Fujitsu

Basic Architecture – Client/Service

Grid Client Stack

lattice_submit / lattice_retrieve

Service-specific* submit / retrieve scripts

Client.pm – base Perl module

Service-specific* submit / retrieve classes

GSBL – Grid Service Base Library

Globus API

Command-line Interface Perl Java

* Service-specific templates and stubs are created by the Grid Service Generator

Grid Service Stack

Service-specific* Implementation

GSBL – Grid Service Base Library

Globus API

Grid Service Hosting Environment, a.k.a. “the container” Java

* Service-specific templates and stubs are created by the Grid Service Generator

Tools for Writing Grid Services

Grid Service Base Library (GSBL) Java API for building Grid services with the

Globus Toolkit Shields programmers from having to work with

the Globus API directly Provides a high-level interface for

operations such as job submission and file transfer

Grid Service Generator (GSG) Simplifies the process of creating Grid

Services Intended for use with GSBL

GSBL: Design and Features Classes for:

Clients and services (base classes)

Argument description and processing

File transfers Job submission and

control Security

configuration Java synchronization

and Globus notifications to paper over event-based model

ClientApplication

(e.g., BLAST)Application

(e.g., BLAST)

GSBL GSBL

Globus API Globus API

Service

Grid Service Generator

Deploying a Grid service with Globus is absurdly complicated Many files, namespaces: lots of

potential typos GSG takes as input a few

parameters (service name, location, an XML argument description, etc.) and generates all requisite configuration files and skeleton Java classes

Grid Services

Application

Condor (Linux/UNIX)

BOINC

Linux X86 Win32 Mac OS X

BLAST1 Yes No No No

Clustal W Yes Yes Yes Yes

CNS Yes Yes Yes No

Lamarc Yes Yes Yes Yes

MDIV Yes Yes Yes Yes

Migrate-N Yes Yes Yes Yes

Modeltest Yes Yes Yes Yes

MrBayes Yes Yes Yes Yes

ms Yes Yes Yes Yes

Muscle Yes Yes Yes Yes

PAUP*2 Yes No No No

Phyml Yes Yes Yes Yes

Pknots Yes Yes Yes Yes

Seq-gen Yes Yes Yes Yes

Snn Yes Yes Yes Yes

ssearch Yes Yes Yes Yes

Structure3 Yes No No No

Grid Services Creating Grid Services requires:

Knowledge of the application Techniques for compiling and porting the

application to various platforms Knowledge of the infrastructure so it can

be effectively tested and deployed Challenges:

Maintaining bodies of Grid Service code as the number of applications grow and new versions of applications are released

Minimizing the number of updates that need to be applied when the framework changes

Basic Architecture - Scheduling

Condor-G: ClassAds

Resources and jobs send Condor-G descriptions of themselves called ClassAds Jobs require certain capabilities

of resources Resources advertise their

capabilities Similar to a dating service: central

broker points pairs of compatible jobs/resources at each other

Condor G: ClassAds

Condor Collector

Resource A Resource B Resource C

I haveMrBayes!

I haveSSEARCH!

I havePAUP*!

Condor user

I need MrBayes!

Resource CCondor user

I hear you haveMrBayes?

Well, let's talkabout that...

Generating ClassAds

Job ClassAds are generated by the Condor-G job manager Job requirements are specified in the

Grid service configuration files

Resource ClassAds are generated by extracting information from MDS Lattice information providers supply

data required for matchmaking

Monitoring and Discovery System (MDS2)

Globus information services component LDAP-based (new version XML-based)

Answers questions like: What resources are available? What capabilities do these resources

have? What is the load on these resources?

This in turn allows for intelligent decisions to be made in areas such as scheduling and resource accounting

Basic Architecture - Resources

Current Grid Resources

http://lattice.umiacs.umd.edu/resources/

UMIACS Condor pool > 400 processors

BOINC pools Clients on campus > 100 Public (off-campus) clients > 1000

BOINC Works on the “pull” model, that is:

One or more servers create workunits Clients connect asynchronously, pull down

work, and return the results Clients are relatively lightweight and

easy to install and manage One client can process work for

multiple projects Participants can join teams and are

given credit for the work they complete http://lattice.umiacs.umd.edu/

boinc_public

Globus-BOINC Adapter

Consists of a number of components that allow us to run Grid Services on BOINC BOINC job manager Custom validator and assimilator

Registers BOINC with Globus as a GRAM-addressable resource

BOINC compatibility library eases the process of porting applications to BOINC

Research Projects Using the Grid

The Laboratory of David Fushman has run protein-protein docking algorithms on Lattice CNS is the primary Grid service in this

project Floyd Reed and Holly Mortensen from the

Laboratory of Sarah Tishkoff have run a number of population genetics analyses MDIV and IM are the primary Grid services

The Laboratory of Molecular Evolution has run statistical phylogenetic analyses GSI is the primary Grid service

Results of Grid Usage

IM – 0.13 CPU years (BOINC) MDIV – 4.93 CPU years (BOINC) CNS – 12.4 CPU years (BOINC) GSI – 94.05 CPU years (Condor)

Total: 111.51 CPU years BOINC participants in 21

countries

Outline





GT4 Research and Development We are currently upgrading the Grid system to

use Globus Toolkit 4.0 GT4 adheres strictly to emerging and

established Web service standards Actively developed and supported Many components have been greatly improved

GridFTP/RFT (will replace GASS) WS GRAM MDS4 (XML based; replaces MDS2, LDAP based)

Our basic architecture remains the same, and the upgrade has been made easier because of tools we have already developed (GSBL, GSG)

More Information

Lattice Website http://lattice.umiacs.umd.edu/

grid computing - an overview michael p. cummings laboratory of molecular evolution center for...

Documents

domains grid resources

development slide

umiacs slide

jobs slide

disparate resources

appealing slide

grid spans departments

electrical power grid