globus toolkit developer tutorial: dataszkovacs/parhrendszseg/... · 2003-05-09 · may 9, 2003...

33
Data Management Globus Toolkit™ Developer Tutorial The Globus Project™ Argonne National Laboratory USC Information Sciences Institute http://www.globus.org/ Copyright (c) 2002 University of Chicago and The University of Southern California. All Rights Reserved. This presentation is licensed for use under the terms of the Globus Toolkit Public License. See http://www.globus.org/toolkit/download/license.html for the full text of this license.

Upload: others

Post on 13-Apr-2020

4 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

Data Management

Globus Toolkit™ Developer Tutorial

The Globus Project™Argonne National Laboratory

USC Information Sciences Institute

http://www.globus.org/

Copyright (c) 2002 University of Chicago and The University of Southern California. All Rights Reserved. This presentation is licensed for use under the terms of the Globus Toolkit Public License.See http://www.globus.org/toolkit/download/license.html for the full text of this license.

Page 2: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 2Globus Toolkit™ Developer Tutorial: Data

Data Management Services

Data transfer and access– GASS: Simple, multi-protocol file transfer

tools; integrated with GRAM– GridFTP: Provides high-performance, reliable

data transfer for modern WANs

Data replication and management– Replica Catalog: Provides a catalog service

for keeping track of replicated datasets– Replica Management: Provides services for

creating and managing replicated datasets

Page 3: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 3Globus Toolkit™ Developer Tutorial: Data

GASSRemote I/O and Staging

Used by GRAM to:– Pull executable from remote location

– Move stdin/stdout/stderr from/to a remote location

Access files from a remote location

Page 4: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 4Globus Toolkit™ Developer Tutorial: Data

What is GASS?Global Access to Secondary Storage

(a) GASS file access API– Replace open/close with globus_gass_open/close;

read/write calls can then proceed locally

(b) RSL extensions – URLs used to name executables, stdout, stderr

(c) Remote cache management utility– File cache: a local secondary storage area in the

execution machine where copies of remote files are stored

(d) Low-level APIs for specialized behaviors

Page 5: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 5Globus Toolkit™ Developer Tutorial: Data

GASS Architecture

Cache

GRAM

&(executable=https://…)

(b) RSL extensions

(d) Low-level APIs for customizing cache & GASS server

main( ) {fd = globus_gass_open(…)…read(fd,…)…globus_gass_close(fd)

}

GASS Server

HTTP Server(a) GASS file access API

FTP Server

Cache

(c) Remote cache management

% globus-gass-cache

Page 6: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 6Globus Toolkit™ Developer Tutorial: Data

GASS File Naming

URL encoding of resource nameshttps://quad.mcs.anl.gov:9991/~bester/myjob

protocol server address file name

Other exampleshttps://pitcairn.mcs.anl.gov/tmp/input_dataset.1

https://pitcairn.mcs.anl.gov:2222/./output_data

http://www.globus.org/~bester/input_dataset.2

Currently supports http & https

Future release will also support ftp & gridftp.

Page 7: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 7Globus Toolkit™ Developer Tutorial: Data

GASS RSL Extensions

executable, stdin, stdout, stderr can be local files or URLs

executable and stdin loaded into local cache before job begins (on front-end node)

stdout, stderr handled via GASS append mode

Cache cleaned after job completes

Page 8: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 8Globus Toolkit™ Developer Tutorial: Data

GASS/RSL Example

&(executable=https://quad:1234/~/myexe)(stdin=https://quad:1234/~/myin)(stdout=/home/bester/output)(stderr=https://quad:1234/dev/stdout)

Page 9: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 9Globus Toolkit™ Developer Tutorial: Data

Example GASS Applications

On-demand, transparent loading of data sets

Caching of (small) data sets

Automatic staging of code and data to remote supercomputers

(Near) real-time logging of application output to remote server

Page 10: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 10Globus Toolkit™ Developer Tutorial: Data

GASS File Access API

Minimum changes to application

globus_gass_open(), globus_gass_close()– Same as open(), close() but use URLs instead

of filenames

– Caches URL in case of multiple opens

– Return descriptors to files in local cache or sockets to remote server

Page 11: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 11Globus Toolkit™ Developer Tutorial: Data

GASS File Access API (cont)

Support for different access patterns– Read-only (from local cache)

– Write-only (to local cache)

– Write-only, append (to remote server)

In all cases the general assumption: there is no concurrent file access among several application programs

Page 12: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 12Globus Toolkit™ Developer Tutorial: Data

Read-only access

fileRemote location

Filecopy

File copy

Internet

P1 P2 P3 P4 P5

Read entire file

Site 1 Site 2

Multiple readers

Page 13: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 13Globus Toolkit™ Developer Tutorial: Data

Write-only access

FileRemote location

File File

Internet

P1 P2 P3 P4 P5

Write entire file

Site 1 Site 2Multiple writers: last writer wins

Page 14: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 14Globus Toolkit™ Developer Tutorial: Data

Append-only access

File

Internet

P1 P2 P3 P4 P5

Append block to file

Site 1 Site 2

Multiple writers:

Remote location

Page 15: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 15Globus Toolkit™ Developer Tutorial: Data

GASS File API Semantics

Copy on open to cache if – not write-only append and– not already in cache

Copy on close from cache if – not read-only and– no other copies open

Multiple globus_gass_open() calls share local copy of fileReference counting keeps track of open filesAppend to remote file if write-only append mode: e.g., for stdout and stderr

Page 16: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 16Globus Toolkit™ Developer Tutorial: Data

globus_gass_open()

Strategy: Fetch and cache on first read open

Download Fileinto cache

open cached file,add cachereference

URL in cache? no

yes

Optimized for parallel processing where several processes access the same file

Page 17: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 17Globus Toolkit™ Developer Tutorial: Data

GASS File OpenAdvantage:– The file is transferred only once even if it is used by

several processes

– The file in cache can be accessed locally by conventional I/O calls

Disadvantage if the file is too large:– Computation maybe delayed too long

– Local cache maybe too small to store the entire file

Solutions:– Prestaging

– Specialized GASS servers

Page 18: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 18Globus Toolkit™ Developer Tutorial: Data

globus_gass_close()

Strategy: Flush cache and transfer on last close (for a write file)

Remove cachereference

Upload changes

Modified no

yes

Page 19: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 19Globus Toolkit™ Developer Tutorial: Data

GASS File Close

Solution: The reference count is checked. – If it is one, the file is copied back to the remote

location and deleted from the cache

– Otherwise, the reference count is decremented

Advantage:– Reduces bandwidth requirements when multiple

processes at the same location write to the same file

– The file in cache can be accessed locally by conventional I/O calls

– Conflicts are resolved locally, not remotely, and the file is transferred only once

Page 20: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 20Globus Toolkit™ Developer Tutorial: Data

Special case: write-only append

Append to remote file if write-only-append mode: e.g., for stdout and stderrA remote file that is opened in write-only-append mode is not placed in the cacheRather:– a communication stream is created to the

remote location– Write operations to the file are translated into

communication operations on that stream

Page 21: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 21Globus Toolkit™ Developer Tutorial: Data

Append-only access

File

Internet

P1 P2 P3 P4 P5

Append block to file

Site 1 Site 2

Multiple writers:

Remote location

Page 22: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 22Globus Toolkit™ Developer Tutorial: Data

Remote Cache Management Utilities

Remote management of caches, for– Cache cleanup and management

Support operations on local & remote caches

Functionality encapsulated in a program: globus-gass-cache

Page 23: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 23Globus Toolkit™ Developer Tutorial: Data

GASS Cache Semantics

For each “file” in the cache, we record– Local file name– URL (i.e., the remote location)– Reference count: a set of tagged references

Tags associated with references allow clean up of cache, e.g. following failure– Tag is job_manager_contact (if file accessed

via file access API) or programmer-specified– Commands allow “remove all refs with tag T”

Page 24: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 24Globus Toolkit™ Developer Tutorial: Data

globus-gass-cache Specification

globus-gass-cache op [-r resource] [-t tag] URLWhere op is one of – add : add URL to cache with tag– delete : remove one reference of tag for URL– cleanup_tag : remove all refs of tag for URL– cleanup_url : remove specified URL from cache– list : list contents of cache

URL is optional for cleanup_tag and listIf resource not specified, default to local cache

Page 25: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 25Globus Toolkit™ Developer Tutorial: Data

globus-gass-cache Examples

globus-gass-cache add -t experiment1 x-gass://host:port/file1

Add file “file1” (located at x-gass://host:port) to the local cache; label reference with tag “experiment1”

globus-gass-cache add -r tuva.mcs.anl.gov-fork \

x-gass://host:port/file2

Add file “file2” (located at x-gass://host:port) to the cache at tuva.mcs.anl.gov-fork

Page 26: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 26Globus Toolkit™ Developer Tutorial: Data

globus_gass_cache

Module for manipulating the GASS cache– globus_gass_cache_open(), …_close()

– globus_gass_cache_add(), …_add_done()

– globus_gass_cache_delete_start(), …_delete()

– globus_gass_cache_cleanup_tag()

– globus_gass_cache_cleanup_file()

– globus_gass_cache_list()

This module does NOT fill in the contents of the cache files. It just handles, manages naming and lifetimes of files.

Page 27: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 27Globus Toolkit™ Developer Tutorial: Data

globus_gass_transfer

Common API for transferring remote files/data over various protocols– http and https currently supported

– ftp will be supported in future release

Supports put and get operations on an URL

Allows for efficient transfer to/from files or directly to/from memory

Allows any application to easily add customized file/data service capabilities

Page 28: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 28Globus Toolkit™ Developer Tutorial: Data

globus_gass_copy

Simple API for copying data from a source to a destination– URL used for source and destination

– http(s), (gsi)ftp, file

Page 29: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 29Globus Toolkit™ Developer Tutorial: Data

globus-gass-server

Simple file server– Run by user wherever necessary– Secure https protocol, using GSI– APIs for embedding server into other programs

Exampleglobus-gass-server –r –w -t

– -r: Allow files to be read from this server– -w: Allow files to be written to this server– -t: Tilde expand (~/… $(HOME)/…)– -help: For list of all options

Page 30: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 30Globus Toolkit™ Developer Tutorial: Data

globus_gass_server_ez

Very simply API for adding file service to any application– Wrapper around globus_gass_transfer

globusrun uses this module to support executable staging, stdout/stderrredirection, and remote file access

Page 31: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 31Globus Toolkit™ Developer Tutorial: Data

GRAM & GASS: Putting It Together1. Derive Contact String2. Build RSL string3. Startup GASS server4. Submit to request5. Return output

jobmanager

gatekeeper

program

stdout

GASS server

3

4

globus-job-run

Hostname

Contactstring

1

RSLstring

2CommandLine Args

4

55

55 4

Page 32: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 32Globus Toolkit™ Developer Tutorial: Data

Globus Components In ActionLocal Machine

mpirun

globusrun

GRAM

ClientGSI

GRAM

ClientGSI

Remote Machine

AppNexus

AIX

PBS

MPI

grid-proxy-initX509UserCert

UserProxyCert

Machines

GRAM Gatekeeper

GSI

GRAM Job Manager

GASS Client

Remote Machine

AppNexus

Solaris

Unix Fork

MPI

GRAM Gatekeeper

GSI

GRAM Job Manager

GASS Client

RSL string

RSL multi-request

RSL single requestDUROC

GASS Server

RSL parser

Page 33: Globus Toolkit Developer Tutorial: Dataszkovacs/ParhRendszSeg/... · 2003-05-09 · May 9, 2003 Globus Toolkit™ Developer Tutorial: Data 15. GASS File API Semantics. z. Copy on

May 9, 2003 33Globus Toolkit™ Developer Tutorial: Data

Köszönöm a figyelmüketKöszönöm a figyelmüket

?

További információ: www.lpds.sztaki.hu