high level grid services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

High Level Grid Services

Warren SmithTexas Advanced Computing

CenterUniversity of Texas


Outline• Grid Monitoring

– Ganglia– MonALISA– Nagios– Others

• Workflow– Condor DAGMan (and Condor-G)– Pegasus

• Data– Storage Resource Broker– Replica Location Service– Distributed file systems


Other High Level Services(Not Covered)

• Resource Brokering• Metascheduling

– GRMS, MARS• Credential issuance

– PURSE, GAMA• Authorization

– Shibboleth– VOMS– CAS


Grid Monitoring• Ganglia• MonALISA• Nagios• Others


Gangliahttp://ganglia.sourceforge.net

• Monitors clusters and aggregations of clusters

• Collects system status information– Provided in XML documents– Provides it graphically via a web interface

• Can be subscribed to and aggregated across multiple clusters

• Focus on simplicity and performance– Can monitor 1000s of systems

• MDS, MonALISA can consume information provided by Ganglia


gmond• Ganglia Monitoring Daemon• Runs on each resource being monitored• Collects a standard set of information• Configuration file specifies

– When to collect information– When to send

• Based on time and/or change– Who to send to– Who to allow to request

• Supports UDP unicast, UDP multicast, TCP


Information collected by gmondName Description Linux FreeBSD Solaris AIX MacOS X IRIX HPUX Tru64

boottime System boot timestamp X X X X X X X Xbread_sec Buffer reads per second Xbwrite_sec Buffer writes per second Xbytes_in Number of bytes in per second X X Xbytes_out Number of bytes out per second X X Xcpu_aidle Percent of time since boot idle CPU X Xcpu_idle Percent CPU idle X X X X X X X Xcpu_intr Time spent processing interrupts Xcpu_nice Percent CPU nice X X X Xcpu_num Number of CPUs X X X X X X X Xcpu_speed Speed in MHz of CPU X X X X X X X Xcpu_ssys Time in kernel mode Xcpu_system Percent CPU system X X X X X X Xcpu_user Percent CPU user X X X X X X Xcpu_wait Time spent waiting Xcpu_wio Time spent in i/o wait Xdisk_free Total free disk space X Xdisk_total Total available disk space X Xload_fifteen Fifteen minute load average X X X X X X X Xload_five Five minute load average X X X X X X X Xload_one One minute load average X X X X X X X Xlocation GPS coordinates for host X X X X X X X Xlread_sec Linear reads per second Xlwrite_sec Linear writes per second Xmachine_type Machine hardware (uname -m) X X X X X X X Xmem_arm Available real memory Xmem_avm Available virtual memory Xmem_buffers Amount of buffered memory X X X Xmem_cached Amount of cached memory X X X Xmem_free Amount of available memory X X X X X X X Xmem_rm Total real memory Xmem_shared Amount of shared memory X X X Xmem_total Total memory X X X X X X X Xmem_vm Total virtual memory Xmtu Network maximum transmission unit X X X X X X X Xos_name Operating system name X X X X X X X Xos_release Operating system release (version) X X X X X X X Xpart_max_used Maximum percent used for all partitions X Xphread_sec Physical reads per second Xphwrite_sec Physical writes per second Xpkts_in Packets in per second X X Xpkts_out Packets out per second X X Xproc_run Total number of running processes X X X X Xproc_total Total number of processes X X X X X Xrcache Read cache hit ratio Xswap_free Amount of available swap memory X X X X X X Xswap_total Total amount of swap memory X X X X X X Xsys_clock Current time on host X X X X X X X Xwcache Write cache hit ratio X


gmetric• Program to provide custom

information to Ganglia– e.g. CPU temperature, batch queue

length• Uses the gmond configuration file

to determine who to send to• Executed as a cron job

– Execute command(s) to gather the data

– Execute gmetric to send data


gmetad• Aggregates information from gmonds• Configuration file specifies which gmonds to

get data from– Connects to gmonds using TCP

• Stores information in Round Robin Database (RRD)– Small database where data for each attribute is

stored in time order– Maximum size– Oldest data is forgotten

• PHP scripts to display RRD data as web pages– Graphs over time


Who’s Using Ganglia?• Planet Lab• Lots of clusters

– SDSC– NASA Goddard– Naval Research Lab– …


MonALISAhttp://monalisa.cacr.caltech.edu

• Distributed monitoring system• Agent-based design• Written in Java• Uses JINI & SOAP/WSDL

– Locating services & communicating• Gathers information using other systems

– SNMP, Ganglia, MRTG, Hawkeye, custom• Clients

– Locate and subscribe to services that provide monitoring information

– GUI client, web client, administrative client


Monitoring I2 Network Traffic,

Grid03 Farms and Jobs


MonALISA Services• Autonomous, self-describing

services– Built on a generic Dynamic

Distributed Services Architecture • Each monitoring service stores

data in a relational database• Automatic update of monitoring

services• Lookup discovery service


Who’s using MonALISA?• Open Science Grid

– Included in the Virtual Data Toolkit• Internet2• ABILENE• Compact Muon Solenoid• Many others


Nagios Overview• A monitoring framework

– Configurable– Extensible

• Provides a relatively comprehensive set of functionality

• Supports distributed monitoring• Supports taking actions in addition to monitoring• Large community using and extending

• Doesn’t store historical data in a true database• Quality of add-ons varies


Nagiossend_ncsa

Nagios plugins

Nagios configura

tion files

Remote system

Architecture

Nagiossend_nsca

Nagios plugins

Nagios configura

tion files

Remote system

Nagios CGIs

Nagios

NSCA

httpd

Nagios log files

Nagios plugins

Nagios configura

tion files

Central collector


Nagios Features I• Web interface

– Current status, graphs• Monitoring

– Monitoring of a number of properties included– People provide plugins to monitor other properties, we

can do the same– Periodic monitoring w/ user-defined periods

• Thresholds to indicate problems• Actions when problems occur

– Notification• Email, page, extensible

– Actions to attempt to fix problem (e.g. restart a daemon)


Nagios Features II• Escalations

– If a problem occurs n times do x• Attempt to fix automatically

– If a probem occurs more than n times do y• Ticket in to trouble ticket system

– …• Distributed monitoring

– A Nagios daemon can test things all over– Can also have Nagios daemons on multiple

systems• Certain daemons can act as central collection points


Who’s Using Nagios?• It’s included in a number of Unix

distros– Debian– SUSE– Gentoo– OpenBSD

• Nagios users can register with the site– 986 sites have registered– ~200,000 hosts monitored– ~720,000 services monitored


TeraGrid’s Inca• Hierarchical Status

Monitoring– Groups tests into logical

sets– Supports many levels of

detail and summarization• Flexible, scalable

architecture– Very simple reporter API– Can use existing test scripts

(unit tests, status tools)– Hierarchical controllers– Several query/display tools


And Many Others…• SNMP

– OpenNMS– HP OpenView

• Big Brother / Big Sister• Globus MDS• ACDC (U Buffalo)• GridCat• GPIR (TACC)• …


Workflow• Condor DAGMan

– Starting with Condor-G• Pegasus


Workflow Definition• Set of tasks with dependencies• Tasks can be anything, but in grids:

– Execute programs– Move data

• Dependencies can be– Control - “do T2 after T1 finishes”– Data - “T2 input 1 comes from T1 output 1”

• Can be acyclic or have cycles/iterations• Can have conditional execution• A large variety of types of workflows


Condor-G: Condor + Globushttp://www.cs.wisc.edu/condor

• Submit your jobs to condor– Jobs say they want to run via Globus

• Condor manages your jobs– Queuing, fault tolerance

• Submits jobs to resources via Globus


Globus Universe• Condor has a number of universes

– Standard - to take advantage of features like checkpointing and redirecting file I/O

– Vanilla - to run jobs without the frills– Java - to run java codes

• Globus universe to run jobs via Globus– Universe = Globus– Which Globus Gatekeeper to use– Optional: Location of file containing your Globus certificateuniverse = globusglobusscheduler = beak.cs.wisc.edu/jobmanagerexecutable = prognamequeue


How Condor-G Works

Schedd

LSF

Personal Condor Globus Resource

• Queues, submits, and manages jobs• Available commands:

– condor_submit, condor_rm, condor_q,condor_hold, …

• Manages cluster resources


How Condor-G Works

Schedd

LSF


600 Globusjobs


How Condor-G Works

Schedd

LSF


GridManager

600 Globusjobs


How Condor-G Works

Schedd JobManager

LSF


GridManager

600 Globusjobs


How Condor-G Works

Schedd JobManager

LSF

User Job


GridManager

600 Globusjobs


Globus Universe Fault Tolerance

• Submit side failure:– All relevant state for each submitted job is stored

persistently in the Condor job queue. – This persistent information allows the Condor

GridManager upon restart to read the state information and reconnect to JobManagers that were running at the time of the crash.

• Execute side:– Condor worked with Globus to improve fault tolerance

• X.509 proxy expiration– Condor can put jobs on hold and email user to refresh

proxy


Condor DAGMan

• Directed Acyclic Graph Manager

• DAGMan allows you to specify the dependencies between your Condor jobs, so it can manage them automatically for you.

• (e.g., “Don’t run job “B” until job “A” has completed successfully.”)


What is a DAG?

• A DAG is the data structure used by DAGMan to represent these dependencies.

• Each job is a “node” in the DAG.

• Each node can have any number of “parent” or “children” nodes – as long as there are no loops!

Job A

Job B

Job C

Job D


Defining a DAG• A DAG is defined by a .dag file, listing each of its nodes

and their dependencies:# diamond.dagJob A a.subJob B b.subJob C c.subJob D d.subParent A Child B CParent B C Child D

• Each node will run the Condor job specified byits accompanying Condor submit file

• Each node can have a pre and post step

Job A

Job B Job C

Job D


Submitting a DAG• To start your DAG, just run condor_submit_dag with

your .dag file, and Condor will start a personal DAGMan daemon which to begin running your jobs:

% condor_submit_dag diamond.dag

• condor_submit_dag submits a Scheduler Universe Job with DAGMan as the executable.

• Thus the DAGMan daemon itself runs as a Condor job, so you don’t have to baby-sit it.


Running a DAG• DAGMan manages the submission of your jobs

to Condor based on the DAG dependencies.– Can configure throttling of job submission

• In case of a failure, DAGMan creates a “rescue” file with the current state of the DAG.– Failures can be retried a configurable number of

times– The rescue file can be used to restore the prior state

of the DAG when restarting• Once the DAG is complete, the DAGMan job

itself is finished, and exits


Who’s Using Condor-G & DAGMan?

• Pegasus• LIGO, Atlas, CMS, …• gLite• TACC• DAGMan available on every

Condor pool


Pegasushttp://pegasus.isi.edu

• Pegasus - Planning for Execution on Grids– Intelligently decide how to run a workflow on

a grid• Take as input an abstract workflow

– Abstract DAG in XML (DAX)• Generates concrete workflow

– Select computer systems (MDS)– Select file replicas (RLS)

• Executes the workflow (Condor Dagman)


Scientific AnalysisW

orkf

low

Evo

lutio

n Select the Input Data

Map the Workflow onto Available Resources

Execute the Workflow

Construct the Analysis

Workflow Template

Abstract Worfklow

Concrete Workflow

Tasks to be executed

Grid Resources

Pegasus

Science Gateway

Condor


Pegasus Workflows• Abstract workflow

– Edges are data dependencies• Implicit data movement

– Processing on the data• Concrete workflow

– Edges are control flow• Explicit data movement as tasks

• Acyclic• Supports parallelism


Who’s Using Pegasus?• LIGO• Atlas High energy physics application• Southern California Earthquake Center

(SCEC) • Astronomy: Montage and Galaxy

Morphology applications• Bioinformatics• Tomography


Data• Storage Resource Broker• Replica Location Service


Storage Resource Broker (SRB)

http://www.sdsc.edu/srb• Manages collections of data– In many cases, the data are files

• Provides a logical namespace• Maps logical names to physical instances• Associates metadata with logical names

– Metadata Catalog (MCat)• Interfaces to variety of storage

– Local disk– Parallel file systems– Archives– Databases


SRB Client Implementations• A set of Basic APIs

– Over 160 APIs– Used by all clients to make request to

servers• Scommands

– Unix like command line utilities for UNIX and Window platforms

– Over 60 - Sls, Scp, Sput, Sget …


SRB Client Implementations• inQ – Window GUI browser• Jargon – Java SRB client classes

– Pure Java implementation• mySRB – Web based GUI

– run using web browser• Java Admin Tool

– GUI for User and Resource management• Matrix – Web service for SRB work flow


SRBserver

SRB agent

SRBserver

Example Read

MCAT

Read Application

SRB agent

1

2

3 4

7

6

Logical Name

1.Logical-to-Physical mapping2.Identification of Replicas3.Access & Audit Control

Peer-to-peer

Brokering

Data Access

R1R2

7

5

5


Authentication• Grid Security Infrastructure

– PKI certificates• Challenge-response mechanism

– No passwords sent over network• Ticket

– Valid for specified time period or number of accesses

• Generic Security Service API– Authentication of server to remote

storage


Authorization• Collection-owned data

– At each remote storage system, an account ID is created under which the data grid stores files

• User authenticates to SRB• SRB checks access controls• SRB server authenticates to a remote

SRB server• Remote SRB server authenticates to the

remote storage repository


Metadata in SRB• SRB System Metadata• Free-form Metadata (User-defined)

– Attribute-Value-Unit Triplets…• Extensible Schema Metadata

– User Defined – Tables integrated into MCAT Core Schema

• External Database• Metadata operations

– Metadata Insertion through User Interfaces– Bulk Metadata Insertion– Template based Metadata Extraction– Query Metadata through well defined

Interfaces


Who’s Using SRB?• Very large number of users• A sample:

– National Virtual Observatory– Large Hadron Collider– NASA– NCAR– BIRN


Replica Location Service (RLS)

http://www.globus.org/toolkit/data/rls/• Maintains a mapping from logical file names to

physical file names– 1 logical file to 1+ physical files

• Improves performance and fault tolerance when accessing data

• Supports user-defined attributes of logical files• Component of Globus toolkit

– WS-RF service• RLS was designed and implemented in a

collaboration between the Globus project and the EU DataGrid project


Replica Location Service In Context

Replica Location Service Reliable DataTransfer Service

GridFTP

Reliable Replication Service

Replica Consistency Management Services

MetadataService

• RLS is one component in a data management architecture• Provides a simple, distributed registry of mappings• Consistency management provided by higher-level

services


LRC LRC LRC

RLIRLI

LRCLRC

Replica Location Indexes

Local Replica Catalogs• Replica Location Index (RLI) nodes aggregate information about one or more LRCs

• LRCs use soft state update mechanisms to inform RLIs about their state: relaxed consistency of index

• Optional compression of state updates reduces communication, CPU and storage overheads

RLS Features

• Local Replica Catalogs (LRCs) contain consistent information about logical-to-target mappings


Who’s Using RLS?• Used with Pegasus and Chimera:

– LIGO– Atlas High energy physics application– Southern California Earthquake Center (SCEC) – Astronomy: Montage and Galaxy Morphology

applications– Bioinformatics– Tomography

• Other RLS Users– QCD Grid, US CMS experiment (integrated with

POOL)


Distributed File Systems• What everyone would like• Hard to implement• Features that are needed

– Performance– Fault tolerance– Security– Fine-grained authorization– Access via Unix file system libraries and

programs– User-defined metadata

• Some would like this


Example Distributed File Systems

• AFS & DFS– Kerberos for security– Performance and fault

tolerance problems• NFS

– Performance, security, and fault tolerance problems

• NFSv4– Tries to imporve

performance and security

• GridNFS– Univ of Michigan– Extend NFSv4– Add grid security and

improve performance• IBM GPFS

– Originally designed as a cluster parallel file system

– Being used in distributed environments

– Relatively large hardware requirements


Summary• Grid Monitoring

– Ganglia– MonALISA– Nagios– Others

• Workflow– Condor DAGMan (and Condor-G)– Pegasus

• Data– Storage Resource Broker– Replica Location Service– Distributed file systems

high level grid services

Documents