high level grid services

61
December 8 & 9, 2005, Austin, TX SURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide High Level Grid Services Warren Smith Texas Advanced Computing Center University of Texas

Upload: psyche

Post on 25-Feb-2016

36 views

Category:

Documents


0 download

DESCRIPTION

High Level Grid Services. Warren Smith Texas Advanced Computing Center University of Texas. Outline. Grid Monitoring Ganglia MonALISA Nagios Others Workflow Condor DAGMan (and Condor-G) Pegasus Data Storage Resource Broker Replica Location Service Distributed file systems. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

High Level Grid Services

Warren SmithTexas Advanced Computing

CenterUniversity of Texas

Page 2: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Outline• Grid Monitoring

– Ganglia– MonALISA– Nagios– Others

• Workflow– Condor DAGMan (and Condor-G)– Pegasus

• Data– Storage Resource Broker– Replica Location Service– Distributed file systems

Page 3: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Other High Level Services(Not Covered)

• Resource Brokering• Metascheduling

– GRMS, MARS• Credential issuance

– PURSE, GAMA• Authorization

– Shibboleth– VOMS– CAS

Page 4: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Grid Monitoring• Ganglia• MonALISA• Nagios• Others

Page 5: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Gangliahttp://ganglia.sourceforge.net

• Monitors clusters and aggregations of clusters

• Collects system status information– Provided in XML documents– Provides it graphically via a web interface

• Can be subscribed to and aggregated across multiple clusters

• Focus on simplicity and performance– Can monitor 1000s of systems

• MDS, MonALISA can consume information provided by Ganglia

Page 6: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Page 7: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Page 8: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

gmond• Ganglia Monitoring Daemon• Runs on each resource being monitored• Collects a standard set of information• Configuration file specifies

– When to collect information– When to send

• Based on time and/or change– Who to send to– Who to allow to request

• Supports UDP unicast, UDP multicast, TCP

Page 9: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Information collected by gmondName Description Linux FreeBSD Solaris AIX MacOS X IRIX HPUX Tru64

boottime System boot timestamp X X X X X X X Xbread_sec Buffer reads per second Xbwrite_sec Buffer writes per second Xbytes_in Number of bytes in per second X X Xbytes_out Number of bytes out per second X X Xcpu_aidle Percent of time since boot idle CPU X Xcpu_idle Percent CPU idle X X X X X X X Xcpu_intr Time spent processing interrupts Xcpu_nice Percent CPU nice X X X Xcpu_num Number of CPUs X X X X X X X Xcpu_speed Speed in MHz of CPU X X X X X X X Xcpu_ssys Time in kernel mode Xcpu_system Percent CPU system X X X X X X Xcpu_user Percent CPU user X X X X X X Xcpu_wait Time spent waiting Xcpu_wio Time spent in i/o wait Xdisk_free Total free disk space X Xdisk_total Total available disk space X Xload_fifteen Fifteen minute load average X X X X X X X Xload_five Five minute load average X X X X X X X Xload_one One minute load average X X X X X X X Xlocation GPS coordinates for host X X X X X X X Xlread_sec Linear reads per second Xlwrite_sec Linear writes per second Xmachine_type Machine hardware (uname -m) X X X X X X X Xmem_arm Available real memory Xmem_avm Available virtual memory Xmem_buffers Amount of buffered memory X X X Xmem_cached Amount of cached memory X X X Xmem_free Amount of available memory X X X X X X X Xmem_rm Total real memory Xmem_shared Amount of shared memory X X X Xmem_total Total memory X X X X X X X Xmem_vm Total virtual memory Xmtu Network maximum transmission unit X X X X X X X Xos_name Operating system name X X X X X X X Xos_release Operating system release (version) X X X X X X X Xpart_max_used Maximum percent used for all partitions X Xphread_sec Physical reads per second Xphwrite_sec Physical writes per second Xpkts_in Packets in per second X X Xpkts_out Packets out per second X X Xproc_run Total number of running processes X X X X Xproc_total Total number of processes X X X X X Xrcache Read cache hit ratio Xswap_free Amount of available swap memory X X X X X X Xswap_total Total amount of swap memory X X X X X X Xsys_clock Current time on host X X X X X X X Xwcache Write cache hit ratio X

Page 10: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

gmetric• Program to provide custom

information to Ganglia– e.g. CPU temperature, batch queue

length• Uses the gmond configuration file

to determine who to send to• Executed as a cron job

– Execute command(s) to gather the data

– Execute gmetric to send data

Page 11: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

gmetad• Aggregates information from gmonds• Configuration file specifies which gmonds to

get data from– Connects to gmonds using TCP

• Stores information in Round Robin Database (RRD)– Small database where data for each attribute is

stored in time order– Maximum size– Oldest data is forgotten

• PHP scripts to display RRD data as web pages– Graphs over time

Page 12: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Who’s Using Ganglia?• Planet Lab• Lots of clusters

– SDSC– NASA Goddard– Naval Research Lab– …

Page 13: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

MonALISAhttp://monalisa.cacr.caltech.edu

• Distributed monitoring system• Agent-based design• Written in Java• Uses JINI & SOAP/WSDL

– Locating services & communicating• Gathers information using other systems

– SNMP, Ganglia, MRTG, Hawkeye, custom• Clients

– Locate and subscribe to services that provide monitoring information

– GUI client, web client, administrative client

Page 14: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Monitoring I2 Network Traffic,

Grid03 Farms and Jobs

Page 15: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

MonALISA Services• Autonomous, self-describing

services– Built on a generic Dynamic

Distributed Services Architecture • Each monitoring service stores

data in a relational database• Automatic update of monitoring

services• Lookup discovery service

Page 16: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Who’s using MonALISA?• Open Science Grid

– Included in the Virtual Data Toolkit• Internet2• ABILENE• Compact Muon Solenoid• Many others

Page 17: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Nagios Overview• A monitoring framework

– Configurable– Extensible

• Provides a relatively comprehensive set of functionality

• Supports distributed monitoring• Supports taking actions in addition to monitoring• Large community using and extending

• Doesn’t store historical data in a true database• Quality of add-ons varies

Page 18: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Page 19: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Page 20: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Nagiossend_ncsa

Nagios plugins

Nagios configura

tion files

Remote system

Architecture

Nagiossend_nsca

Nagios plugins

Nagios configura

tion files

Remote system

Nagios CGIs

Nagios

NSCA

httpd

Nagios log files

Nagios plugins

Nagios configura

tion files

Central collector

Page 21: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Nagios Features I• Web interface

– Current status, graphs• Monitoring

– Monitoring of a number of properties included– People provide plugins to monitor other properties, we

can do the same– Periodic monitoring w/ user-defined periods

• Thresholds to indicate problems• Actions when problems occur

– Notification• Email, page, extensible

– Actions to attempt to fix problem (e.g. restart a daemon)

Page 22: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Nagios Features II• Escalations

– If a problem occurs n times do x• Attempt to fix automatically

– If a probem occurs more than n times do y• Ticket in to trouble ticket system

– …• Distributed monitoring

– A Nagios daemon can test things all over– Can also have Nagios daemons on multiple

systems• Certain daemons can act as central collection points

Page 23: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Who’s Using Nagios?• It’s included in a number of Unix

distros– Debian– SUSE– Gentoo– OpenBSD

• Nagios users can register with the site– 986 sites have registered– ~200,000 hosts monitored– ~720,000 services monitored

Page 24: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

TeraGrid’s Inca• Hierarchical Status

Monitoring– Groups tests into logical

sets– Supports many levels of

detail and summarization• Flexible, scalable

architecture– Very simple reporter API– Can use existing test scripts

(unit tests, status tools)– Hierarchical controllers– Several query/display tools

Page 25: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

And Many Others…• SNMP

– OpenNMS– HP OpenView

• Big Brother / Big Sister• Globus MDS• ACDC (U Buffalo)• GridCat• GPIR (TACC)• …

Page 26: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Workflow• Condor DAGMan

– Starting with Condor-G• Pegasus

Page 27: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Workflow Definition• Set of tasks with dependencies• Tasks can be anything, but in grids:

– Execute programs– Move data

• Dependencies can be– Control - “do T2 after T1 finishes”– Data - “T2 input 1 comes from T1 output 1”

• Can be acyclic or have cycles/iterations• Can have conditional execution• A large variety of types of workflows

Page 28: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Condor-G: Condor + Globushttp://www.cs.wisc.edu/condor

• Submit your jobs to condor– Jobs say they want to run via Globus

• Condor manages your jobs– Queuing, fault tolerance

• Submits jobs to resources via Globus

Page 29: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Globus Universe• Condor has a number of universes

– Standard - to take advantage of features like checkpointing and redirecting file I/O

– Vanilla - to run jobs without the frills– Java - to run java codes

• Globus universe to run jobs via Globus– Universe = Globus– Which Globus Gatekeeper to use– Optional: Location of file containing your Globus certificateuniverse = globusglobusscheduler = beak.cs.wisc.edu/jobmanagerexecutable = prognamequeue

Page 30: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

How Condor-G Works

Schedd

LSF

Personal Condor Globus Resource

• Queues, submits, and manages jobs• Available commands:

– condor_submit, condor_rm, condor_q,condor_hold, …

• Manages cluster resources

Page 31: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

How Condor-G Works

Schedd

LSF

Personal Condor Globus Resource

600 Globusjobs

Page 32: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

How Condor-G Works

Schedd

LSF

Personal Condor Globus Resource

GridManager

600 Globusjobs

Page 33: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

How Condor-G Works

Schedd JobManager

LSF

Personal Condor Globus Resource

GridManager

600 Globusjobs

Page 34: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

How Condor-G Works

Schedd JobManager

LSF

User Job

Personal Condor Globus Resource

GridManager

600 Globusjobs

Page 35: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Globus Universe Fault Tolerance

• Submit side failure:– All relevant state for each submitted job is stored

persistently in the Condor job queue. – This persistent information allows the Condor

GridManager upon restart to read the state information and reconnect to JobManagers that were running at the time of the crash.

• Execute side:– Condor worked with Globus to improve fault tolerance

• X.509 proxy expiration– Condor can put jobs on hold and email user to refresh

proxy

Page 36: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Condor DAGMan

• Directed Acyclic Graph Manager

• DAGMan allows you to specify the dependencies between your Condor jobs, so it can manage them automatically for you.

• (e.g., “Don’t run job “B” until job “A” has completed successfully.”)

Page 37: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

What is a DAG?

• A DAG is the data structure used by DAGMan to represent these dependencies.

• Each job is a “node” in the DAG.

• Each node can have any number of “parent” or “children” nodes – as long as there are no loops!

Job A

Job B

Job C

Job D

Page 38: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Defining a DAG• A DAG is defined by a .dag file, listing each of its nodes

and their dependencies:# diamond.dagJob A a.subJob B b.subJob C c.subJob D d.subParent A Child B CParent B C Child D

• Each node will run the Condor job specified byits accompanying Condor submit file

• Each node can have a pre and post step

Job A

Job B Job C

Job D

Page 39: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Submitting a DAG• To start your DAG, just run condor_submit_dag with

your .dag file, and Condor will start a personal DAGMan daemon which to begin running your jobs:

% condor_submit_dag diamond.dag

• condor_submit_dag submits a Scheduler Universe Job with DAGMan as the executable.

• Thus the DAGMan daemon itself runs as a Condor job, so you don’t have to baby-sit it.

Page 40: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Running a DAG• DAGMan manages the submission of your jobs

to Condor based on the DAG dependencies.– Can configure throttling of job submission

• In case of a failure, DAGMan creates a “rescue” file with the current state of the DAG.– Failures can be retried a configurable number of

times– The rescue file can be used to restore the prior state

of the DAG when restarting• Once the DAG is complete, the DAGMan job

itself is finished, and exits

Page 41: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Who’s Using Condor-G & DAGMan?

• Pegasus• LIGO, Atlas, CMS, …• gLite• TACC• DAGMan available on every

Condor pool

Page 42: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Pegasushttp://pegasus.isi.edu

• Pegasus - Planning for Execution on Grids– Intelligently decide how to run a workflow on

a grid• Take as input an abstract workflow

– Abstract DAG in XML (DAX)• Generates concrete workflow

– Select computer systems (MDS)– Select file replicas (RLS)

• Executes the workflow (Condor Dagman)

Page 43: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Scientific AnalysisW

orkf

low

Evo

lutio

n Select the Input Data

Map the Workflow onto Available Resources

Execute the Workflow

Construct the Analysis

Workflow Template

Abstract Worfklow

Concrete Workflow

Tasks to be executed

Grid Resources

Pegasus

Science Gateway

Condor

Page 44: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Pegasus Workflows• Abstract workflow

– Edges are data dependencies• Implicit data movement

– Processing on the data• Concrete workflow

– Edges are control flow• Explicit data movement as tasks

• Acyclic• Supports parallelism

Page 45: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Who’s Using Pegasus?• LIGO• Atlas High energy physics application• Southern California Earthquake Center

(SCEC) • Astronomy: Montage and Galaxy

Morphology applications• Bioinformatics• Tomography

Page 46: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Data• Storage Resource Broker• Replica Location Service

Page 47: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Storage Resource Broker (SRB)

http://www.sdsc.edu/srb• Manages collections of data– In many cases, the data are files

• Provides a logical namespace• Maps logical names to physical instances• Associates metadata with logical names

– Metadata Catalog (MCat)• Interfaces to variety of storage

– Local disk– Parallel file systems– Archives– Databases

Page 48: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

SRB Client Implementations• A set of Basic APIs

– Over 160 APIs– Used by all clients to make request to

servers• Scommands

– Unix like command line utilities for UNIX and Window platforms

– Over 60 - Sls, Scp, Sput, Sget …

Page 49: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

SRB Client Implementations• inQ – Window GUI browser• Jargon – Java SRB client classes

– Pure Java implementation• mySRB – Web based GUI

– run using web browser• Java Admin Tool

– GUI for User and Resource management• Matrix – Web service for SRB work flow

Page 50: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

SRBserver

SRB agent

SRBserver

Example Read

MCAT

Read Application

SRB agent

1

2

3 4

7

6

Logical Name

1.Logical-to-Physical mapping2.Identification of Replicas3.Access & Audit Control

Peer-to-peer

Brokering

Data Access

R1R2

7

5

5

Page 51: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Authentication• Grid Security Infrastructure

– PKI certificates• Challenge-response mechanism

– No passwords sent over network• Ticket

– Valid for specified time period or number of accesses

• Generic Security Service API– Authentication of server to remote

storage

Page 52: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Authorization• Collection-owned data

– At each remote storage system, an account ID is created under which the data grid stores files

• User authenticates to SRB• SRB checks access controls• SRB server authenticates to a remote

SRB server• Remote SRB server authenticates to the

remote storage repository

Page 53: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Metadata in SRB• SRB System Metadata• Free-form Metadata (User-defined)

– Attribute-Value-Unit Triplets…• Extensible Schema Metadata

– User Defined – Tables integrated into MCAT Core Schema

• External Database• Metadata operations

– Metadata Insertion through User Interfaces– Bulk Metadata Insertion– Template based Metadata Extraction– Query Metadata through well defined

Interfaces

Page 54: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Who’s Using SRB?• Very large number of users• A sample:

– National Virtual Observatory– Large Hadron Collider– NASA– NCAR– BIRN

Page 55: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Replica Location Service (RLS)

http://www.globus.org/toolkit/data/rls/• Maintains a mapping from logical file names to

physical file names– 1 logical file to 1+ physical files

• Improves performance and fault tolerance when accessing data

• Supports user-defined attributes of logical files• Component of Globus toolkit

– WS-RF service• RLS was designed and implemented in a

collaboration between the Globus project and the EU DataGrid project

Page 56: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Replica Location Service In Context

Replica Location Service Reliable DataTransfer Service

GridFTP

Reliable Replication Service

Replica Consistency Management Services

MetadataService

• RLS is one component in a data management architecture• Provides a simple, distributed registry of mappings• Consistency management provided by higher-level

services

Page 57: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

LRC LRC LRC

RLIRLI

LRCLRC

Replica Location Indexes

Local Replica Catalogs• Replica Location Index (RLI) nodes aggregate information about one or more LRCs

• LRCs use soft state update mechanisms to inform RLIs about their state: relaxed consistency of index

• Optional compression of state updates reduces communication, CPU and storage overheads

RLS Features

• Local Replica Catalogs (LRCs) contain consistent information about logical-to-target mappings

Page 58: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Who’s Using RLS?• Used with Pegasus and Chimera:

– LIGO– Atlas High energy physics application– Southern California Earthquake Center (SCEC) – Astronomy: Montage and Galaxy Morphology

applications– Bioinformatics– Tomography

• Other RLS Users– QCD Grid, US CMS experiment (integrated with

POOL)

Page 59: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Distributed File Systems• What everyone would like• Hard to implement• Features that are needed

– Performance– Fault tolerance– Security– Fine-grained authorization– Access via Unix file system libraries and

programs– User-defined metadata

• Some would like this

Page 60: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Example Distributed File Systems

• AFS & DFS– Kerberos for security– Performance and fault

tolerance problems• NFS

– Performance, security, and fault tolerance problems

• NFSv4– Tries to imporve

performance and security

• GridNFS– Univ of Michigan– Extend NFSv4– Add grid security and

improve performance• IBM GPFS

– Originally designed as a cluster parallel file system

– Being used in distributed environments

– Relatively large hardware requirements

Page 61: High Level Grid Services

December 8 & 9, 2005, Austin, TXSURA Cyberinfrastructure Workshop Series: Grid Technology: The Rough Guide

Summary• Grid Monitoring

– Ganglia– MonALISA– Nagios– Others

• Workflow– Condor DAGMan (and Condor-G)– Pegasus

• Data– Storage Resource Broker– Replica Location Service– Distributed file systems