© 2007 open grid forum data management challenge - the view from ogf ogf22 – february 28, 2008...

19
© 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure <[email protected]> David E. Martin <[email protected]> Data Area Directors

Upload: melanie-king

Post on 27-Mar-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: © 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area

© 2007 Open Grid Forum

Data Management Challenge -The View from OGFOGF22 – February 28, 2008

Cambridge, MA, USA

Erwin Laure <[email protected]>David E. Martin <[email protected]>Data Area Directors

Page 2: © 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area

© 2007 Open Grid Forum 2

Early Grid View of Grids

• Early Grid systems had a quite simplistic view:

1. Dispatch a job to machine2. GridFTP files to the machine from “Somewhere”3. Run the job4. GridFTP results to “Somewhere”

• Grids defined “Computing Elements (CE)”• Data and storage was considered to be “there” • Storage Elements (SE) concept came much later

• Barely OK for Initial Data Analysis• Physics, Geosciences, etc

Page 3: © 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area

© 2007 Open Grid Forum

Then Data kicked in …

• Compute jobs have to deal with input/output data, transient data

• Data is • Heterogeneous (storage, data formats)• Distributed• Independently managed

3

Page 4: © 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area

© 2007 Open Grid Forum 4

The Grid Grows Up

• Databases Access• DAIS

• Storage/File Management• SRM

• File/Data Transfer• gridFTP, RTF, FTS

• Data Location• RLS, LFC

• Metadata• Data Management Systems

• SRB

• …

Page 5: © 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area

© 2007 Open Grid Forum

5

Client SRM

Storage5

1

2

1. The client asks the SRM for the file providing an SURL (Site URL)2. The SRM asks the storage system to provide the file3. The storage system notifies the availability of the file and its location 4. The SRM returns a TURL (Transfer URL), i.e. the location from where the

file can be accessed5. The client interacts with the storage using the protocol specified in the

TURL

3

4

SRM Interactions

Page 6: © 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area

© 2007 Open Grid Forum

6

MySQL

OGSA-DAI service

Engine

SQLQuery

JDBCData

Resources

Activities

DB2

GZip GridFTPXPath

XMLDB

XIndice

readFile

File

SWISSPROT

XSLT

SQLServer

Data-bases

ApplicationApplicationClient ToolkitClient Toolkit

Page 7: © 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area

© 2007 Open Grid Forum

7

GridFTP and RFT

Control

Data

Control

Data

Control

Data

Control

Data

globus-url-copy RFT Service

RFT Client

SOAP Messages

Notifications(Optional)

Page 8: © 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area

© 2007 Open Grid Forum

8

gLite FTS

• Logical unit of management• Represent a directed network pipe between two sites

• Mono-directional, Dedicated link• Independently manageable

• State• Number of streams • Number of concurrent transfers

• Inter-VO scheduling• VO share

• No Routing involved

• Non-dedicated channels • E.g. star channel

Page 9: © 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area

© 2007 Open Grid Forum Data Management in Production Grids

9

SRB as a Data Grid

SRB

MCAT

DB

SRB

SRB

SRB

SRB SRB

Data Grid has arbitrary number of serversComplexity is hidden from users

Page 10: © 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area

© 2007 Open Grid Forum 10

Need for Grid Data Architecture

• and Standards

• OGF OGSA Data Architecture WG• Started in October 2005• Data Architecture document published as

GFD.121

Page 11: © 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area

© 2007 Open Grid Forum

OGSA-Data Architecture

11

Sink/ Source

Sink/ SourceAccess Description AccessDescriptionStorage

Managed Storage

Stored Data Resources

Other Data Resources

Serviceinterface

Resourceinterface

Client APIs (non-OGSA) / Other services

Data Service

Data ServiceStorage Managemen

t

Page 12: © 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area

© 2007 Open Grid Forum

OGSA-Data: Data Replication/Transfer

12

Sink/ Source

Transfer

Access Sink/ Source

Description AccessDescription

Replication Transfer

Data Resources

Data Resources

Serviceinterface

Resourceinterface

Transfer ProtocolsTransfer Protocols

Client APIs (non-OGSA) / Other services

Data Service

Data Service

Replication

Page 13: © 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area

© 2007 Open Grid Forum

OGF Data Area WGs I

• Data Format Description Language WG (dfdl-wg) • Describe the structure of binary and character encoded files and data

streams

• Database Access and Integration Services WG (dais-wg) • Provide consistent access to existing, autonomously managed databases

from web services

• Grid File System Working Group (gfs-wg) • Service interface(s) and architecture of a logical file system

• Grid Storage Management WG (gsm-wg) • Provide dynamic space allocation and file management of shared storage

components on the Grid (Storage Resource Manager – SRM)

• GridFTP WG (gridftp-wg) • Improvements of FTP suitable for grid applications.

13

Page 14: © 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area

© 2007 Open Grid Forum

OGF Data Area WGs II

• Info Dissemination WG (infod-wg) • Develop a model for Information Dissemination

• OGSA ByteIO Working Group (byteio-wg) • Define a minimal Web Service interface for providing

"POSIX-like" file functionality

• OGSA Data Movement Interface WG (ogsa-dmi-wg)• Managed data movement

• OGSA-Data Working Group (ogsa-d-wg) • Data Architecture

14

Page 15: © 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area

© 2007 Open Grid Forum

Activities related to file system and data movement

• GFS:• Resource Namespace Service Specification

(GFD.101)

• Byte-IO:• Byte-IO OGSA WSRF Basic Profile Rendering

(GFD.88)

• GSM• The Storage Resource Manager Interface

Specification Version 2.2 (in public comment)

• DMI• OGSA-DMI Specification (in public comment)

15

Page 16: © 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area

© 2007 Open Grid Forum

Data Architecture: Gaps

• Standardized metadata• Identify query languages, data formats,

transport protocols, …• Needed in DAIS, DMI, ByteIO, …

• Data catalogs & Registries• Discovery an important part of Grids

• Replication/Caching• Data Federation

16

Page 17: © 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area

© 2007 Open Grid Forum 17

Standards Gaps

• Caching and Replication• Integrated Data Management• Transactions in a Grid• Storage Provisioning• Virtualization• Provenance, Integrity, Policy• File Metadata• Streaming• Versioning

Page 18: © 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area

© 2007 Open Grid Forum 18

Standards Gaps

• Dependencies• Security: IETF, OGF

• Management: DMTF, SNIA

• WS-*: OASIS and W3C

Page 19: © 2007 Open Grid Forum Data Management Challenge - The View from OGF OGF22 – February 28, 2008 Cambridge, MA, USA Erwin Laure David E. Martin Data Area

© 2007 Open Grid Forum

Main Focus for Future Work

• File systems• NFSv4, pNFS

• Interface to Metadata stores

• Policies (not only Data)

• Name your favorite

19

Whe

re ca

n we e

xploi

t syn

ergie

s with

SNIA

?