globus toolkit developer tutorial: dataszkovacs/parhrendszseg/... · 2003-05-09 · may 9, 2003...
Post on 13-Apr-2020
4 Views
Preview:
TRANSCRIPT
Data Management
Globus Toolkit™ Developer Tutorial
The Globus Project™Argonne National Laboratory
USC Information Sciences Institute
http://www.globus.org/
Copyright (c) 2002 University of Chicago and The University of Southern California. All Rights Reserved. This presentation is licensed for use under the terms of the Globus Toolkit Public License.See http://www.globus.org/toolkit/download/license.html for the full text of this license.
May 9, 2003 2Globus Toolkit™ Developer Tutorial: Data
Data Management Services
Data transfer and access– GASS: Simple, multi-protocol file transfer
tools; integrated with GRAM– GridFTP: Provides high-performance, reliable
data transfer for modern WANs
Data replication and management– Replica Catalog: Provides a catalog service
for keeping track of replicated datasets– Replica Management: Provides services for
creating and managing replicated datasets
May 9, 2003 3Globus Toolkit™ Developer Tutorial: Data
GASSRemote I/O and Staging
Used by GRAM to:– Pull executable from remote location
– Move stdin/stdout/stderr from/to a remote location
Access files from a remote location
May 9, 2003 4Globus Toolkit™ Developer Tutorial: Data
What is GASS?Global Access to Secondary Storage
(a) GASS file access API– Replace open/close with globus_gass_open/close;
read/write calls can then proceed locally
(b) RSL extensions – URLs used to name executables, stdout, stderr
(c) Remote cache management utility– File cache: a local secondary storage area in the
execution machine where copies of remote files are stored
(d) Low-level APIs for specialized behaviors
May 9, 2003 5Globus Toolkit™ Developer Tutorial: Data
GASS Architecture
Cache
GRAM
&(executable=https://…)
(b) RSL extensions
(d) Low-level APIs for customizing cache & GASS server
main( ) {fd = globus_gass_open(…)…read(fd,…)…globus_gass_close(fd)
}
GASS Server
HTTP Server(a) GASS file access API
FTP Server
Cache
(c) Remote cache management
% globus-gass-cache
May 9, 2003 6Globus Toolkit™ Developer Tutorial: Data
GASS File Naming
URL encoding of resource nameshttps://quad.mcs.anl.gov:9991/~bester/myjob
protocol server address file name
Other exampleshttps://pitcairn.mcs.anl.gov/tmp/input_dataset.1
https://pitcairn.mcs.anl.gov:2222/./output_data
http://www.globus.org/~bester/input_dataset.2
Currently supports http & https
Future release will also support ftp & gridftp.
May 9, 2003 7Globus Toolkit™ Developer Tutorial: Data
GASS RSL Extensions
executable, stdin, stdout, stderr can be local files or URLs
executable and stdin loaded into local cache before job begins (on front-end node)
stdout, stderr handled via GASS append mode
Cache cleaned after job completes
May 9, 2003 8Globus Toolkit™ Developer Tutorial: Data
GASS/RSL Example
&(executable=https://quad:1234/~/myexe)(stdin=https://quad:1234/~/myin)(stdout=/home/bester/output)(stderr=https://quad:1234/dev/stdout)
May 9, 2003 9Globus Toolkit™ Developer Tutorial: Data
Example GASS Applications
On-demand, transparent loading of data sets
Caching of (small) data sets
Automatic staging of code and data to remote supercomputers
(Near) real-time logging of application output to remote server
May 9, 2003 10Globus Toolkit™ Developer Tutorial: Data
GASS File Access API
Minimum changes to application
globus_gass_open(), globus_gass_close()– Same as open(), close() but use URLs instead
of filenames
– Caches URL in case of multiple opens
– Return descriptors to files in local cache or sockets to remote server
May 9, 2003 11Globus Toolkit™ Developer Tutorial: Data
GASS File Access API (cont)
Support for different access patterns– Read-only (from local cache)
– Write-only (to local cache)
– Write-only, append (to remote server)
In all cases the general assumption: there is no concurrent file access among several application programs
May 9, 2003 12Globus Toolkit™ Developer Tutorial: Data
Read-only access
fileRemote location
Filecopy
File copy
Internet
P1 P2 P3 P4 P5
Read entire file
Site 1 Site 2
Multiple readers
May 9, 2003 13Globus Toolkit™ Developer Tutorial: Data
Write-only access
FileRemote location
File File
Internet
P1 P2 P3 P4 P5
Write entire file
Site 1 Site 2Multiple writers: last writer wins
May 9, 2003 14Globus Toolkit™ Developer Tutorial: Data
Append-only access
File
Internet
P1 P2 P3 P4 P5
Append block to file
Site 1 Site 2
Multiple writers:
Remote location
May 9, 2003 15Globus Toolkit™ Developer Tutorial: Data
GASS File API Semantics
Copy on open to cache if – not write-only append and– not already in cache
Copy on close from cache if – not read-only and– no other copies open
Multiple globus_gass_open() calls share local copy of fileReference counting keeps track of open filesAppend to remote file if write-only append mode: e.g., for stdout and stderr
May 9, 2003 16Globus Toolkit™ Developer Tutorial: Data
globus_gass_open()
Strategy: Fetch and cache on first read open
Download Fileinto cache
open cached file,add cachereference
URL in cache? no
yes
Optimized for parallel processing where several processes access the same file
May 9, 2003 17Globus Toolkit™ Developer Tutorial: Data
GASS File OpenAdvantage:– The file is transferred only once even if it is used by
several processes
– The file in cache can be accessed locally by conventional I/O calls
Disadvantage if the file is too large:– Computation maybe delayed too long
– Local cache maybe too small to store the entire file
Solutions:– Prestaging
– Specialized GASS servers
May 9, 2003 18Globus Toolkit™ Developer Tutorial: Data
globus_gass_close()
Strategy: Flush cache and transfer on last close (for a write file)
Remove cachereference
Upload changes
Modified no
yes
May 9, 2003 19Globus Toolkit™ Developer Tutorial: Data
GASS File Close
Solution: The reference count is checked. – If it is one, the file is copied back to the remote
location and deleted from the cache
– Otherwise, the reference count is decremented
Advantage:– Reduces bandwidth requirements when multiple
processes at the same location write to the same file
– The file in cache can be accessed locally by conventional I/O calls
– Conflicts are resolved locally, not remotely, and the file is transferred only once
May 9, 2003 20Globus Toolkit™ Developer Tutorial: Data
Special case: write-only append
Append to remote file if write-only-append mode: e.g., for stdout and stderrA remote file that is opened in write-only-append mode is not placed in the cacheRather:– a communication stream is created to the
remote location– Write operations to the file are translated into
communication operations on that stream
May 9, 2003 21Globus Toolkit™ Developer Tutorial: Data
Append-only access
File
Internet
P1 P2 P3 P4 P5
Append block to file
Site 1 Site 2
Multiple writers:
Remote location
May 9, 2003 22Globus Toolkit™ Developer Tutorial: Data
Remote Cache Management Utilities
Remote management of caches, for– Cache cleanup and management
Support operations on local & remote caches
Functionality encapsulated in a program: globus-gass-cache
May 9, 2003 23Globus Toolkit™ Developer Tutorial: Data
GASS Cache Semantics
For each “file” in the cache, we record– Local file name– URL (i.e., the remote location)– Reference count: a set of tagged references
Tags associated with references allow clean up of cache, e.g. following failure– Tag is job_manager_contact (if file accessed
via file access API) or programmer-specified– Commands allow “remove all refs with tag T”
May 9, 2003 24Globus Toolkit™ Developer Tutorial: Data
globus-gass-cache Specification
globus-gass-cache op [-r resource] [-t tag] URLWhere op is one of – add : add URL to cache with tag– delete : remove one reference of tag for URL– cleanup_tag : remove all refs of tag for URL– cleanup_url : remove specified URL from cache– list : list contents of cache
URL is optional for cleanup_tag and listIf resource not specified, default to local cache
May 9, 2003 25Globus Toolkit™ Developer Tutorial: Data
globus-gass-cache Examples
globus-gass-cache add -t experiment1 x-gass://host:port/file1
Add file “file1” (located at x-gass://host:port) to the local cache; label reference with tag “experiment1”
globus-gass-cache add -r tuva.mcs.anl.gov-fork \
x-gass://host:port/file2
Add file “file2” (located at x-gass://host:port) to the cache at tuva.mcs.anl.gov-fork
May 9, 2003 26Globus Toolkit™ Developer Tutorial: Data
globus_gass_cache
Module for manipulating the GASS cache– globus_gass_cache_open(), …_close()
– globus_gass_cache_add(), …_add_done()
– globus_gass_cache_delete_start(), …_delete()
– globus_gass_cache_cleanup_tag()
– globus_gass_cache_cleanup_file()
– globus_gass_cache_list()
This module does NOT fill in the contents of the cache files. It just handles, manages naming and lifetimes of files.
May 9, 2003 27Globus Toolkit™ Developer Tutorial: Data
globus_gass_transfer
Common API for transferring remote files/data over various protocols– http and https currently supported
– ftp will be supported in future release
Supports put and get operations on an URL
Allows for efficient transfer to/from files or directly to/from memory
Allows any application to easily add customized file/data service capabilities
May 9, 2003 28Globus Toolkit™ Developer Tutorial: Data
globus_gass_copy
Simple API for copying data from a source to a destination– URL used for source and destination
– http(s), (gsi)ftp, file
May 9, 2003 29Globus Toolkit™ Developer Tutorial: Data
globus-gass-server
Simple file server– Run by user wherever necessary– Secure https protocol, using GSI– APIs for embedding server into other programs
Exampleglobus-gass-server –r –w -t
– -r: Allow files to be read from this server– -w: Allow files to be written to this server– -t: Tilde expand (~/… $(HOME)/…)– -help: For list of all options
May 9, 2003 30Globus Toolkit™ Developer Tutorial: Data
globus_gass_server_ez
Very simply API for adding file service to any application– Wrapper around globus_gass_transfer
globusrun uses this module to support executable staging, stdout/stderrredirection, and remote file access
May 9, 2003 31Globus Toolkit™ Developer Tutorial: Data
GRAM & GASS: Putting It Together1. Derive Contact String2. Build RSL string3. Startup GASS server4. Submit to request5. Return output
jobmanager
gatekeeper
program
stdout
GASS server
3
4
globus-job-run
Hostname
Contactstring
1
RSLstring
2CommandLine Args
4
55
55 4
May 9, 2003 32Globus Toolkit™ Developer Tutorial: Data
Globus Components In ActionLocal Machine
mpirun
globusrun
GRAM
ClientGSI
GRAM
ClientGSI
Remote Machine
AppNexus
AIX
PBS
MPI
grid-proxy-initX509UserCert
UserProxyCert
Machines
GRAM Gatekeeper
GSI
GRAM Job Manager
GASS Client
Remote Machine
AppNexus
Solaris
Unix Fork
MPI
GRAM Gatekeeper
GSI
GRAM Job Manager
GASS Client
RSL string
RSL multi-request
RSL single requestDUROC
GASS Server
RSL parser
May 9, 2003 33Globus Toolkit™ Developer Tutorial: Data
Köszönöm a figyelmüketKöszönöm a figyelmüket
?
További információ: www.lpds.sztaki.hu
top related