© 2007 open grid forum data management challenge - the view from ogf ogf22 – february 28, 2008...
TRANSCRIPT
© 2007 Open Grid Forum
Data Management Challenge -The View from OGFOGF22 – February 28, 2008
Cambridge, MA, USA
Erwin Laure <[email protected]>David E. Martin <[email protected]>Data Area Directors
© 2007 Open Grid Forum 2
Early Grid View of Grids
• Early Grid systems had a quite simplistic view:
1. Dispatch a job to machine2. GridFTP files to the machine from “Somewhere”3. Run the job4. GridFTP results to “Somewhere”
• Grids defined “Computing Elements (CE)”• Data and storage was considered to be “there” • Storage Elements (SE) concept came much later
• Barely OK for Initial Data Analysis• Physics, Geosciences, etc
© 2007 Open Grid Forum
Then Data kicked in …
• Compute jobs have to deal with input/output data, transient data
• Data is • Heterogeneous (storage, data formats)• Distributed• Independently managed
3
© 2007 Open Grid Forum 4
The Grid Grows Up
• Databases Access• DAIS
• Storage/File Management• SRM
• File/Data Transfer• gridFTP, RTF, FTS
• Data Location• RLS, LFC
• Metadata• Data Management Systems
• SRB
• …
© 2007 Open Grid Forum
5
Client SRM
Storage5
1
2
1. The client asks the SRM for the file providing an SURL (Site URL)2. The SRM asks the storage system to provide the file3. The storage system notifies the availability of the file and its location 4. The SRM returns a TURL (Transfer URL), i.e. the location from where the
file can be accessed5. The client interacts with the storage using the protocol specified in the
TURL
3
4
SRM Interactions
© 2007 Open Grid Forum
6
MySQL
OGSA-DAI service
Engine
SQLQuery
JDBCData
Resources
Activities
DB2
GZip GridFTPXPath
XMLDB
XIndice
readFile
File
SWISSPROT
XSLT
SQLServer
Data-bases
ApplicationApplicationClient ToolkitClient Toolkit
© 2007 Open Grid Forum
7
GridFTP and RFT
Control
Data
Control
Data
Control
Data
Control
Data
globus-url-copy RFT Service
RFT Client
SOAP Messages
Notifications(Optional)
© 2007 Open Grid Forum
8
gLite FTS
• Logical unit of management• Represent a directed network pipe between two sites
• Mono-directional, Dedicated link• Independently manageable
• State• Number of streams • Number of concurrent transfers
• Inter-VO scheduling• VO share
• No Routing involved
• Non-dedicated channels • E.g. star channel
© 2007 Open Grid Forum Data Management in Production Grids
9
SRB as a Data Grid
SRB
MCAT
DB
SRB
SRB
SRB
SRB SRB
Data Grid has arbitrary number of serversComplexity is hidden from users
© 2007 Open Grid Forum 10
Need for Grid Data Architecture
• and Standards
• OGF OGSA Data Architecture WG• Started in October 2005• Data Architecture document published as
GFD.121
© 2007 Open Grid Forum
OGSA-Data Architecture
11
Sink/ Source
Sink/ SourceAccess Description AccessDescriptionStorage
Managed Storage
Stored Data Resources
Other Data Resources
Serviceinterface
Resourceinterface
Client APIs (non-OGSA) / Other services
Data Service
Data ServiceStorage Managemen
t
© 2007 Open Grid Forum
OGSA-Data: Data Replication/Transfer
12
Sink/ Source
Transfer
Access Sink/ Source
Description AccessDescription
Replication Transfer
Data Resources
Data Resources
Serviceinterface
Resourceinterface
Transfer ProtocolsTransfer Protocols
Client APIs (non-OGSA) / Other services
Data Service
Data Service
Replication
© 2007 Open Grid Forum
OGF Data Area WGs I
• Data Format Description Language WG (dfdl-wg) • Describe the structure of binary and character encoded files and data
streams
• Database Access and Integration Services WG (dais-wg) • Provide consistent access to existing, autonomously managed databases
from web services
• Grid File System Working Group (gfs-wg) • Service interface(s) and architecture of a logical file system
• Grid Storage Management WG (gsm-wg) • Provide dynamic space allocation and file management of shared storage
components on the Grid (Storage Resource Manager – SRM)
• GridFTP WG (gridftp-wg) • Improvements of FTP suitable for grid applications.
13
© 2007 Open Grid Forum
OGF Data Area WGs II
• Info Dissemination WG (infod-wg) • Develop a model for Information Dissemination
• OGSA ByteIO Working Group (byteio-wg) • Define a minimal Web Service interface for providing
"POSIX-like" file functionality
• OGSA Data Movement Interface WG (ogsa-dmi-wg)• Managed data movement
• OGSA-Data Working Group (ogsa-d-wg) • Data Architecture
14
© 2007 Open Grid Forum
Activities related to file system and data movement
• GFS:• Resource Namespace Service Specification
(GFD.101)
• Byte-IO:• Byte-IO OGSA WSRF Basic Profile Rendering
(GFD.88)
• GSM• The Storage Resource Manager Interface
Specification Version 2.2 (in public comment)
• DMI• OGSA-DMI Specification (in public comment)
15
© 2007 Open Grid Forum
Data Architecture: Gaps
• Standardized metadata• Identify query languages, data formats,
transport protocols, …• Needed in DAIS, DMI, ByteIO, …
• Data catalogs & Registries• Discovery an important part of Grids
• Replication/Caching• Data Federation
16
© 2007 Open Grid Forum 17
Standards Gaps
• Caching and Replication• Integrated Data Management• Transactions in a Grid• Storage Provisioning• Virtualization• Provenance, Integrity, Policy• File Metadata• Streaming• Versioning
© 2007 Open Grid Forum 18
Standards Gaps
• Dependencies• Security: IETF, OGF
• Management: DMTF, SNIA
• WS-*: OASIS and W3C
© 2007 Open Grid Forum
Main Focus for Future Work
• File systems• NFSv4, pNFS
• Interface to Metadata stores
• Policies (not only Data)
• Name your favorite
19
Whe
re ca
n we e
xploi
t syn
ergie
s with
SNIA
?