san diego supercomputer center sdsc storage resource broker data grid automation arun jagatheesan et...

32
San Diego Supercomputer Center San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of California, San Diego Or What is SRB Matrix? VLDB Workshop on Data Management in Grids Trondheim, Norway, 2-3 September 2005

Upload: matthew-chase

Post on 16-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSan Diego Supercomputer CenterSDSC Storage Resource Broker

Data Grid Automation

Arun Jagatheesan et al.,

San Diego Supercomputer Center

University of California, San Diego

OrWhat is SRB Matrix?

VLDB Workshop on Data Management in GridsTrondheim, Norway, 2-3 September 2005

Page 2: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 2

Talk Outline

• Data grid Landscape• Long-run data management processes

• Data Grid ILM• Data Grid Triggers• Dataflow Pipelines

• Execution Logic – Data Grid Language• End-to-End Infrastructure Deployment

• API• User GUI

• Service-oriented *Infrastructure*

Page 3: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 3

Data Grid Landscape

Page 4: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 4

The “Grid” Vision

Page 5: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 5

Data Grid Resource Providers

Grid Resource Providers (GRP) providing content

and/or storageGRP

/txt3.txt

GRP

Page 6: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 6

Data Grid Administrative Domain

GRP

• Administrative domain with one or more GFS Resource Providers

•Could include their data centers

/txt3.txt

GRP

Research Lab

Page 7: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 7

Data Grid Administrative domains

/…/text1.txt /…//text2.txt

GRP GRP GRP GRPGRP GRP GRP

/txt3.txt

GRP

Storage-R-Us Resource Providers

data + storage (50)

Research lab- Taiwandata + storage (40)

Universitydata + storage (10)

Page 8: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 8

Data Grid (Enterprise Utility)

ABCZ.com USABCZ.com Asia

Data center

IT Department US IT Department Asia 3rd Party

Physical Resources managed by autonomous administrative domains of the same enterprise (ABCZ.com)

Page 9: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 9

Data Grid (Enterprise Utility)

ABCZ.com USABCZ.com Asia

Data center

IT Department US IT Department Asia 3rd Party

Project 1 Project 2

Each project has a data grid instance consisting of

Logical Resources with different SLAs offered by IT

department

Page 10: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 11

Data Grid (Enterprise Utility)

ABCZ.com USABCZ.com Asia

Data center

IT Department US IT Department Asia 3rd Party

Project1 Project2 Project3 Project4

Page 11: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 12

Long-run Processes in Data Grid

• Data Grid ILM• Data Grid Triggers

• Data Gridflows

Page 12: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 13

Data Grid ILM

Page 13: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 14

Change is Constant

• Changes in access patterns• Based on number of users accessing a data• Domains which want to access data

• Data Value• The value of data set (collections?) for a particular domain

based on it business model and users’ access patterns• Each domain will have a different value based on its users

and its role in a data grid

Page 14: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 15

“Data Value” based on users

ABCZ.com USABCZ.com Asia

Data center

IT Department US IT Department Asia 3rd Party

Project1 Project2 Project3 Project4

When more users access a project’ data, its data value increases, move that data to a

faster storage type

Page 15: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 16

“Data Value” based on domain

ABCZ.com USABCZ.com Asia

Data center

IT Department US IT Department Asia 3rd Party

Project1 Project2 Project3 Project4

When more users from the same domain access the data, the data value for that

particular data in that particular domain increases, so replicate the data to resources

in that domain. (converse is also true)

Page 16: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 17

“Data Value” based on role

ABCZ.com USABCZ.com Asia

Data center

IT Department US IT Department Asia 3rd Party

Project1 Project2 Project3 Project4

The 3rd party data center – no users who use data, but is interested in having replica of any data (or deleted data) for long term

preservation

Page 17: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 18

Data Grid ILM• ILM = Information Lifecycle Management• Dynamic re-orientation of data placement and

data retention policies (rules)• Based on “business value of data” and storage

cost• HSM = Hierarchical Storage Management, based

on “data freshness”. ILM goes one step further• Applying this concept on Data Grid, very tricky

as different autonomous domains have different business rules

Page 18: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 19

Data Grid Triggers

Page 19: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 20

Data Grid Triggers

• Similar to triggers in databases• Based on ECA concepts

• Event• Condition• Action

• Example• Event = Insert new file in collection (“/ourProject/data”)• Condition = (color= “blue” && galaxy = “Andromedia”)• Action = Run ( selectiveDataReplicator.dgl )

Page 20: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 21

Data Discovery

Digital entities

Meta-data

Services

State

New data

updates relationships among data in collections

Services invoked to analyze new relationships

DGMS applications get notified of state updates

Digital entities

Meta-data

Services

State

Page 21: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 22

Data Gridflows

Page 22: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 23

Gridflow in SCEC (data information pipeline)

Metadata derivation

Ingest Metadata

Ingest Data

Determine analysis pipeline

Initiate automated analysis

Organize result data into distributeddata grid collections

Use the optimal set of resources

based on the task – on demand

Pipeline could be triggered by input at data source or by a data request

from user

Pipeline could be triggered by input at data source or by a data request

from user

All gridflow activities stored for data flow

provenance

Page 23: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 24

Data Grid Language (DGL)

Page 24: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 25

Data Grid Language

• Requirement• Data Grid ILM process

• The long run process that has to be run is described in DGL

• Data Grid Triggers • Action part of the ECA (Event-Condition-Action) logic

• Data Gridflows• Step by step execution of long run process on Data Grid

• Analogy of SQL in relational databases• Long-run process procedures stored and executed in Data

Grid it self• Captures the “Infrastructure Execution Logic”

Page 25: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 26

DGL RequestAnnotations

about the Data Grid Request

Can be either a Flow or a Status

Query

Page 26: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 27

DGL Requests (2 types)

• Data Grid Flow• An XML Structure that describes the execution logic,

associated procedural rules and DGL variables. Can be synchronous or asynchronous flow

• Status Query• An XML Structure used to query the execution status any

gridflow or a sub-flow at any granular level. Status Queries can be made for both synchronous and asynchronous flows

Page 27: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 28

FlowScoped Variables that can control

the flow

Logic used by the sub-members

Sub-members that are the

real execution statements

Page 28: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 29

Flow Logic (How a flow executes)

Page 29: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 30

…<userDefinedRule name="beforeEntry"><condition><simpleQuery>$numVar == 1</simpleQuery></condition>

<action name="true"><actionString>SET var1 = 1</actionString></action><action name="true"><actionString>SET var2 = "foo"</actionString></action><action name="false"><actionString>SET var1 = 0</actionString></action></userDefinedRule>…

Page 30: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 31

What is SRB Matrix?

• Matrix provides the SRB as a Web Service• Web Service based on Data Grid Language

• SOA for Data Grid or Digital Library• Service oriented *infrastructure*

• Asynchronous end-user facing applications• Long run operations presented to users as portlets

• Data Grid Automation and ILM• File Triggers on unstructured data• Automated movement or management of data

Page 31: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 32

Matrix Gridflow Server Architecture

Matrix Agent Abstraction

In Memory Store

JDBCAgents for java, WSDL

and other grid executables

Persistence (Store) Abstraction

ECA rules Handler

Matrix Data Grid Request Processor

Transaction Handler Status Query Handler

Gridflow Meta data Manager

JAXM Wrapper

SOAP Service for Matrix Clients

Flow Handler andExecution Manager

Workflow Query Processor

XQueryProcessor

JMS Messaging Interface

Event Publish

Subscribe, Notification

SDSC SRB Agents

Other SDSC Data

Services

WSDL Description

Sangam P2P Gridflow Broker and Protocols

Page 32: San Diego Supercomputer Center SDSC Storage Resource Broker Data Grid Automation Arun Jagatheesan et al., San Diego Supercomputer Center University of

San Diego Supercomputer CenterSDSC Storage Resource Broker 33

Conclusion

• Data Grids are evolving• Data Grid Automation of long-run processes

essential• Need a language for Data Grid Automation• Data Grid Language is one such effort as part

SRB Matrix Project• Open source project for anyone to use (or join)• [email protected] (or [email protected])