storage management in grid december 11, 2002

31
Storage Management in Grid December 11, 2002 Sangyong Ha, Chan-Hyun Youn Information and Communications Univ.

Upload: datacenters

Post on 17-Feb-2017

302 views

Category:

Technology


3 download

TRANSCRIPT

Page 1: Storage Management in Grid December 11, 2002

Storage Management in Grid

December 11, 2002

Sangyong Ha, Chan-Hyun Youn

Information and Communications Univ.

Page 2: Storage Management in Grid December 11, 2002

2

OutlineBackgroundExisting Storage Management

SystemsDPSSSRBData management in Globus

Our ApproachConcluding Remarks

Page 3: Storage Management in Grid December 11, 2002

3

Background Problem

To enable a geographically distributed community to performance analyses on petabytes of data efficiently and cost-effectively

A large, geographically dispersed group of researchers Require access to huge amounts of data

Solution Services for handling remote access to large data sets(Storage System) in a grid environment Aimed at Data Intensive Grid Applications

High Energy Physics, Astronomy, Climate modeling, BioInformatics, many others

Page 4: Storage Management in Grid December 11, 2002

4

BackgroundData Grid Requirement

Seamless access to data and information stored at local and remote sites

Virtualization of data, collection and meta information

Dataset Scaling – size & numberIntegrate Data Collections & Associated MetadataMultiplicity of Platforms, Resource & Data TypesAuthentication, Access Control, Auditing FacilitiesHandling Legacy Data & Methods

Page 5: Storage Management in Grid December 11, 2002

5Existing Storage Management Systems in Data Grid

Data storage resource management systemsDPSS, HPSS: focus on high-performance access,

utilize parallel data transfer, striping SRB: connects heterogeneous data collections,

uniform client interface, metadata queries DFS: focus on high-volume usage, dataset

replication, local cachingGlobus data grid support: interface to many

storage system, common extensible transfer protocol

Incompatible Existing Storage Systems

Page 6: Storage Management in Grid December 11, 2002

DPSS( Distributed Parallel Storage

Server)

Page 7: Storage Management in Grid December 11, 2002

7

DPSS(A data cache storage server)Developed by LBNL with support from DoEA network data cache to provide high-speed

parallel access to remote, large, image-like, read-mostly data(cache, not a storage system)

At the application level, the DPSS is a semi-persistent cache of named data-objects, and at the storage level it is a logical block server

Parallel Transfer(Parallelism for many component), Pipeline support, Network Tuning, Agent based Management

Not appropriate for small block R/W

Page 8: Storage Management in Grid December 11, 2002

8

DPSS(Architecture)

Source : LBNL

Page 9: Storage Management in Grid December 11, 2002

9

DPSS(An Architectural Model)

Source : LBNL

Page 10: Storage Management in Grid December 11, 2002

10

DPSS(Overall Architecture)

Source : LBNL

Page 11: Storage Management in Grid December 11, 2002

11

Data Cache Research ItemsHow and when to migrate filesWhen is it better to move the processing to the

data, instead of the data to the processingHow to reserve space on the data cacheHow to achieve high data rates across wide

area networksHow to provide a global data set name spaceHow to ensure efficient data set consistency

Page 12: Storage Management in Grid December 11, 2002

SRB(Storage Resource Broker)

Page 13: Storage Management in Grid December 11, 2002

13

Overview Developed at the San Diego Supercomputer

Center(SDSC) A middleware to provide distributed clients with

uniform access to diverse storage resources, including: Unix file system Archival storage systems such as UNITREE and HPSS Database objects managed by various DBMS including DB2,

Oracle and lllustra MCAT(Metadata Catalogue) to facilitate the brokering

SRB metadata is managed by an MCAT server Stores metadata associated with data sets, users(access

control) and resources

Page 14: Storage Management in Grid December 11, 2002

14

Architecture

SRBArchives

HPSS, ADSM,UniTree, DMF

DatabasesDB2, Oracle,

Sybase

File SystemsUnix, NT,Mac OSX

Application

C, C++, Linux I/O

Unix Shell

Dublin Core

Resource,User

User Defined

ApplicationMeta-data

RemoteProxies

DataCutter

Third-partycopy

Java, NTBrowsers

WebPrologPython

MCATHRM

Source : SDSC

Page 15: Storage Management in Grid December 11, 2002

15

Concept Abstraction of User Space

Single sign-on, Multiple authentication schemes Virtualization of Resources

Resource Location, Type & Access transparency Logical Resource Definitions - bundling

Abstraction of Data and Collections Virtual Collections: Persistent Identifier and Global Name Space Replication & Segmentation

Data Discovery – System & application metadata User-defined Metadata Attribute-based Access (path names become irrelevant)

Uniform Access Methods APIs, Command Line, GUI Browsers, Web-Access (Portal,CGI) Parallel Access with both Client and Server-driven strategies

Page 16: Storage Management in Grid December 11, 2002

16

MCAT(Metadata Catalog) Stores metadata about

Data sets, Collections, Users, Resources, Proxy Methods Maintains replica information for data & containers Provides “Collection” abstraction for data Provides “Global User” name space & authentication Provides Authorization through ACL & tickets Maintains Audit trail on data & collections Maintains metadata for methods and resources Provides Resource Transparency - logical resources Implemented as a relational database

Oracle or DB2 or Sybase

Page 17: Storage Management in Grid December 11, 2002

17

Research Items Large Datasets; Large Number of Datasets; Scaling Distributed, Heterogeneous Storage, Handling

Legacy Data and Methods Discovery and Search, Fault Tolerance and Load

Distribution, Replication Scheduling, Caching & Data Placements, Data

Migration over Time & Space Uniform Name Space Types of Metadata

XML to unstructured Standardized to User-defined Metadata Large Number of Attributes

Presentation – user friendly, Maintenance

Page 18: Storage Management in Grid December 11, 2002

18

DPSS + HPSS + SRB HPSS

High Performance Storage System

HPSS + DPSS HPSS <-> DPSS

Integration Works on integrating the

DPSS into several Grid-like tools SRB + DPSS : DPSS can be

used as a SRB device Globus + DPSS SRB + Globus

Source : LBNL

Page 19: Storage Management in Grid December 11, 2002

Data Management in Globus

Page 20: Storage Management in Grid December 11, 2002

20

Overview Interface to many storage system(DPSS, File

systems, SRB) Decouple low-level data transfer mechanisms

form storage services Three Major Components

Data Transport and Access : Grid FTP based on GSI(Grid Security Infrastructure)

Data Replication : a Replica Location Service and Replica Management

Globus Access to Secondary Storage(GASS) : allows applications to access data stored in any remote filesystem by specifying a URL. (HTTP URL or x-gass URL)

Page 21: Storage Management in Grid December 11, 2002

21

Architecture

Source : Globus, ANL

Page 22: Storage Management in Grid December 11, 2002

22

Major Functions(1)Data Transport and Access(Grid FTP)

PKI or Kerberos supportThird-party control of data transferParallel data transfer Striped data transfer Partial file transferAutomatic negotiation of TCP buffer/window sizesSupport for reliable and re-startable data transferIntegrated instrumentation, for monitoring ongoing transfer performance

Page 23: Storage Management in Grid December 11, 2002

23

Data ReplicationMaintain a mapping between logical names for files

and collections and one or more physical locationsLow-level replica Location and High-level reliable

replicationCombine with GIS(NWS, MDS) to build replica selection

service(find best replica) GASS

Libraries and utilities are provided to eliminate the need tomanually login to sites and ftp files install a distributed file systemCurrently the ftp and x-gass (GASS server) protocols are

supported

Major Functions(2)

Page 24: Storage Management in Grid December 11, 2002

24

Operation of SRB and Globus

Source : GGF Performance & Information WG

Application

ClientReplicaCatalog

FTP Daemon

Storage System

Application

SRB Client

MetadataCatalog

SRB Server

Storage System

Globus SRB

Page 25: Storage Management in Grid December 11, 2002

25

Data Handling Implementations

Source : GGF Performance & Information WG

Page 26: Storage Management in Grid December 11, 2002

Our Approach

Page 27: Storage Management in Grid December 11, 2002

27

Architectural Model

Page 28: Storage Management in Grid December 11, 2002

28

Test System

PCP-III

Linux

CD-ROMCD-ROMIDE HDDIDE HDDF. Drive

PCP-III

Windows

CD-ROMCD-ROMIDE HDDIDE HDDZip Drive

PCP-IVI

Windows

CD-ROMCD-ROMIDE HDDIDE HDDF. Drive

PCAMDLinux

CD-ROMCD-ROMIDE HDDIDE HDDF. Drive

File ServerLinux

Dual NICMySQL 외부

NETWORK SWITCH

Page 29: Storage Management in Grid December 11, 2002

29

Replica Creation/Location/RegistrationReplica Discovery/Lookup/SelectionReplica Deletion and ConsistencyReplication and Load BalancingReplication and

Robustness(Redundancy)Expression of Replica Data(Metadata)Replication Overlay Network

Replica Management

Page 30: Storage Management in Grid December 11, 2002

30

A Scenario Replica Selection Process Read Process

Page 31: Storage Management in Grid December 11, 2002

31

Concluding RemarksIntroduction to Storage Management

in GridDPSS, SRB, Globus Data Grid support

Our approach for Storage management system

Develop Dynamic Data(Replica) Selection and Scheduling model for data intensive Grid applications