san diego supercomputer center sdsc storage resource broker a data storage language for the...

43
San Diego Supercomputer Center San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego Supercomputer Center University of California, San Diego HPTS Workshop Asilomar, California, 25-28 September 2005 Or A talk on Data Grids and DGL

Upload: garry-bailey

Post on 16-Jan-2016

218 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSan Diego Supercomputer CenterSDSC Storage Resource Broker

A Data Storage Language for the Requirements of Rebels and

MisfitsArun Jagatheesan

San Diego Supercomputer Center

University of California, San Diego

HPTS WorkshopAsilomar, California, 25-28 September 2005

OrA talk on Data Grids

and DGL

Page 2: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 2

Talk Outline• “Next Hype in Grids”

• My belief system before we begin• Meet my friends – Rebels and Misfits

• File Systems, Databases, Datagrids• Mapping physical data to logical view• Mapping physical data and storage to logical view• SRB Statistics• Mapping physical data, storage and processes to logical

view• Data Grid Language

• Conclusion• What Now = work and sacrifices; What Next = Vision

He has 44 slides and 20 minutes. No

infotainment slides either – Boring!

Page 3: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 3

Disclaimer and Warning

• My own opinion or thoughts• Arun says so… (can be wrong?)

• Based on my current knowledge and understanding• On September 2005 – current knowledge and level of

understanding (can change?)

• My belief system• I believe in Data Grids for Inter/Intra/Multi-Organizational

Unstructured Data Management (biased ?)• My belief might not be in sync with your belief, but it can

co-exist with your favorite technology

Page 4: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 4

Meet my friends – Rebels and Misfits

• Esoteric Requirements from “High-end” users• To keep them alive, they need more… more of every thing• Requirements not broadly felt or required in industry• They push the existing technology to the limits

• From the existing technology’s perspective…• These folks are nuts!• The existing technology was not designed for these

requirements• My friends become rebels or misfits from the existing

technology’s perspective

Page 5: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 5

Talk Outline• “Next Hype in Grids”

• My belief system before we begin• Meet my friends – Rebels and Misfits

• File Systems, Databases, Datagrids• Mapping physical data to logical view• Mapping physical data and storage to logical view• SRB Statistics• Mapping physical data, storage and processes to logical

view• Data Grid Language

• Conclusion• What Now = work and sacrifices; What Next = Vision

Page 6: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 6

Mapping physical data to logical view

Hierarchical view, independent of

network, disk, sector, track, fragments

Rule : Storage Abstraction – Hide storage resources

Page 7: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 7

Mapping physical data to logical viewRelational view (assume

its a database), independent of network,

disk, sector, track, fragments

Thanks to rebels and misfits in Airline

industry who wanted transactional capabilities

Page 8: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 8

Talk Outline• “Next Hype in Grids”

• My belief system before we begin• Meet my friends – Rebels and Misfits

• File Systems, Databases, Datagrids• Mapping physical data to logical view• Mapping physical data and storage to logical view• SRB Statistics• Mapping physical data, storage and processes to logical

view• Data Grid Language

• Conclusion• What Now = work and sacrifices; What Next = Vision

Page 9: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 9

NIH BIRN SRB Data Grid

• Biomedical Informatics Research Network• Access and analyze biomedical image data• Data resources distributed throughout the country• Medical schools and research centers across the US

• Stable high performance grid based environment• Coordinate data sharing• Federate collections • Support data mining and analysis

Page 10: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 10

Mapping distributed data & storage to logical view

25 Universities or Research Hospitals,

Multiple heterogeneous

storage resources

Page 11: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 11

Approach we have taken in Data Grids

• Logical Schema (view) is independent of physical schema• Just like databases or even file systems

• Physical Resources are provided in the form of logical resources in the logical view• This is very different from databases (may be similar to

tablespaces)

• A database is used for mapping• Data path, network, access permissions, meta data, storage

type, logical storage resource, physical storage resources• Used for digital libraries, persistent archives and data grids

Page 12: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 12

The “Grid” Vision

Page 13: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 13

Data Grid Resource Providers

Grid Resource Providers (GRP) providing content

and/or storageGRP

/txt3.txt

GRP

Page 14: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 14

Data Grid Administrative Domain

GRP

• Administrative domain with one or more Grid Resource Providers

•Could include their data centers

/txt3.txt

GRP

Research Lab

Page 15: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 15

Data Grid Administrative domains

/…/text1.txt /…//text2.txt

GRP GRP GRP GRPGRP GRP GRP

/txt3.txt

GRP

Storage-R-Us Resource Providers

data + storage (50)

Research labdata + storage (40)

Universitydata + storage (10)

Page 16: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 16

Data Grid: Logical view of data & resources

/…/text1.txt /…//text2.txt

GRP GRP GRP GRPGRP GRP GRP

/txt3.txt

GRP

Storage-R-Us Resource Providers

data + storage (50)

Research Labdata + storage (40)

Universitydata + storage (10)

/home/arun.sdsc/exp1/home/arun.sdsc/exp1/text1.txt/home/arun.sdsc/exp1/text2.txt/home/arun.sdsc/exp1/text3.txtdata + storage (100)

Logical Namespace (Need not be same as physical view of

resources )

Page 17: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 17

BIRN: Inter-organizational Data

Page 18: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 18

SDSC SRB User Community (Major US)• BaBar, Stanford Linear Accelerator

Center (SLAC)• California Digital Library (CDL)• Center for Integrated Space Weather

Modeling (CISM)• CVC, Visualization Portal• LDC Data Storage• NIH Bio Informatics Research Network

(BIRN)• NSF Southern California Earthquake

Center (SCEC)• National Archives and Records

Administration (NARA)• National Aeronautics and Space

Administration Centers (NASA)• National Virtual Observatory (NVO)• Npackage, NSF Middleware Initiative

(NMI)

• National Science Digital Library (NSDL)

• National Optical Astronomy Observatory (NOAO)

• ROADNet• Purdue University• SCCOOS, USA• Scientific Rich Media Archive• Salk Institute

• Strand Map Service, USA• UC Berkeley Library• UCSD Library• University of Houston• Persistent Archives Test bed• University of Wisconsin, Madison• WebBase, Stanford University• Yale University Library

Page 19: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 19

SDSC SRB User Community• Academia Sinica, Taiwan• Australian National University• Bio-Lab, University of Genoa, Italy• Council for the Central Laboratory of

the Research Councils (CCLRC), UK• CC-IN2P3, France• Distributed Framework, Singapore • Distributed Aircraft Maintenance

Environment (DAME), UK• eMinerals Project, UK• eScience, Belfast Center• Fraunhofer ITWM, Germany• High Energy Accelerator

Organization, KEK, Japan

• K* Grid Computing, Korea• KEK Computing Center, Japan• Lyon, France• NorGrid, Norway• Nanyang Data Grid, Singapore• NCHC, Taiwan• Queensland University of Technology

(QUT), Australia• Rutherford Appleton Laboratory

(RAL), UK• T-Systems, Germany• UK eScience Project, UK• UniGrid, Poland• UMK, Poland• Virtual Laboratory for eScience,

Netherlands

Page 20: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 20

0

2

4

6

8

10

12

14

> 100TB

> 10 TB > 5 TB > 1 TB > 500GB

< 200GB

Response

Unique

Outside SDSC

324 TB358 TB

682 TB

Total data brokered by SDSC SRB

Page 21: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 21

Talk Outline• “Next Hype in Grids”

• My belief system before we begin• Meet my friends – Rebels and Misfits

• File Systems, Databases, Datagrids• Mapping physical data to logical view• Mapping physical data and storage to logical view• SRB Statistics• Mapping physical data, storage and processes to logical

view• Data Grid Language

• Conclusion• What Now = work and sacrifices; What Next = Vision

Page 22: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 22

Mapping distributed data, storage and processes to logical view

Page 23: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 23

Long-run Processes in Data Grid

• Data Grid ILM• Data Grid Triggers

• Data Gridflows

Page 24: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 24

Data Grid (Enterprise Utility)

ABCZ.com USABCZ.com Asia

Data center

IT Department US IT Department Asia 3rd Party

Physical Resources managed by autonomous administrative domains of the same enterprise (ABCZ.com)

Page 25: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 25

Data Grid (Enterprise Utility)

ABCZ.com USABCZ.com Asia

Data center

IT Department US IT Department Asia 3rd Party

Project 1 Project 2

Each project has a data grid instance consisting of

Logical Resources with different SLAs offered by IT

department

Page 26: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 28

Change is Constant

• Changes in access patterns• Based on number of users accessing a data• Domains which want to access data

• Data Value• The value of data set (collections?) for a particular domain

based on it business model and users’ access patterns• Each domain will have a different value based on its users

and its role in a data grid

Page 27: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 29

“Data Value” based on users

ABCZ.com USABCZ.com Asia

Data center

IT Department US IT Department Asia 3rd Party

Project1 Project2 Project3 Project4

When more users access a project’ data, its data value increases, move that data to a

faster storage type

Page 28: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 30

“Data Value” based on domain

ABCZ.com USABCZ.com Asia

Data center

IT Department US IT Department Asia 3rd Party

Project1 Project2 Project3 Project4

When more users from the same domain access the data, the data value for that

particular data in that particular domain increases, so replicate the data to resources

in that domain. (converse is also true)

Page 29: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 31

“Data Value” based on role

ABCZ.com USABCZ.com Asia

Data center

IT Department US IT Department Asia 3rd Party

Project1 Project2 Project3 Project4

The 3rd party data center – no users who use data, but is interested in having replica of any data (or deleted data) for long term

preservation

Page 30: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 32

Data Grid ILM• ILM = Information Lifecycle Management (Sales

Jargon)• Dynamic re-orientation of data placement and data

retention policies (rules)• Based on “business value of data” and storage

cost• HSM = Hierarchical Storage Management, based

on “data freshness”. ILM goes one step further• Applying this concept on Data Grid, very tricky as

different autonomous domains have different business rules

Page 31: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 33

Data Grid Triggers

• Similar to triggers in databases• Based on ECA concepts

• Event• Condition• Action

• Example• Event = Insert new file in collection (“/ourProject/data”)• Condition = (color= “blue” && galaxy = “Andromedia”)• Action = Run ( selectiveDataReplicator.dgl )

Page 32: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 34

Talk Outline• “Next Hype in Grids”

• My belief system before we begin• Meet my friends – Rebels and Misfits

• File Systems, Databases, Datagrids• Mapping physical data to logical view• Mapping physical data and storage to logical view• SRB Statistics• Mapping physical data, storage and processes to logical

view• Data Grid Language

• Conclusion• What Now = work and sacrifices; What Next = Vision

Page 33: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 35

Data Grid Language

• Requirement• Data Grid ILM process

• The long run process that has to be run is described in DGL

• Data Grid Triggers • Action part of the ECA (Event-Condition-Action) logic

• Data Gridflows• Step by step execution of long run process on Data Grid

• Analogy of SQL in relational databases• Long-run procedures stored and executed in Data Grid it self• Captures the “Infrastructure Execution Logic”

Page 34: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 36

DGL RequestAnnotations

about the Data Grid Request

Can be either a Flow or a Status

Query

Page 35: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 37

DGL Requests (2 types)

• Data Grid Flow• An XML Structure that describes the execution logic,

associated procedural rules and DGL variables. Can be synchronous or asynchronous flow

• Status Query• An XML Structure used to query the execution status any

gridflow or a sub-flow at any granular level. Status Queries can be made for both synchronous and asynchronous flows

Page 36: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 38

FlowScoped Variables that can control

the flow

Logic used by the sub-members

Sub-members that are the

real execution statements

Page 37: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 39

Flow Logic (How a flow executes)

Page 38: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 40

DGL-Response

Responses can be synchronous or asynchronous

Page 39: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 41

Talk Outline• “Next Hype in Grids”

• My belief system before we begin• Meet my friends – Rebels and Misfits

• File Systems, Databases, Datagrids• Mapping physical data to logical view• Mapping physical data and storage to logical view• SRB Statistics• Mapping physical data, storage and processes to logical

view• Data Grid Language

• Conclusion• What Now = work and sacrifices; What Next = Vision

Page 40: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 42

Conclusion

• Data Grids are for real – they manage Inter/Intra/Multi-organizational unstructured data (files, streams, …)

• Data Grids extend the database concepts and internally use a database

• A language like Data Grid Language mentioned here is necessary for the proliferation and automation of Data Grid Management Systems (DGMS)

• Reference: Paper in VLDB Workshop on Data Management in Grids

Page 41: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 43

We are SDSC SRB

Arun is here!- Shameless

Self promotion

Not in picture: Many students

Page 42: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 44

Additional Thanks (Ignorance is a bliss)

• My Advisor: “You already graduated, and have a job at a research firm. Now why are writing to MS Research? Whom did you write to?”

• Me: “I wrote to two people. The first person works on social communities, we can use service brokering for them. I have not got any response from him. But there is another person who did respond. His last name is of the color “Gray” and his web page is very cheesy with music in the background. I guess he does not do much computer science – he works with astronomers.

Page 43: San Diego Supercomputer Center SDSC Storage Resource Broker A Data Storage Language for the Requirements of Rebels and Misfits Arun Jagatheesan San Diego

San Diego Supercomputer CenterSDSC Storage Resource Broker 45

Contact Info

Arun [email protected]

Or

[email protected]://www.sdsc.edu/srb/