stork: a scheduler for data placement activities in grid

Post on 23-Jan-2016

19 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

STORK: A Scheduler for Data Placement Activities in Grid. Tevfik Kosar University of Wisconsin-Madison kosart@cs.wisc.edu. Some Remarkable Numbers. Characteristics of four physics experiments targeted by GriPhyN:. Source: GriPhyN Proposal, 2000. Even More Remarkable…. - PowerPoint PPT Presentation

TRANSCRIPT

STORK: A Scheduler for Data Placement

Activitiesin Grid

Tevfik KosarUniversity of Wisconsin-Madison

kosart@cs.wisc.edu

2www.cs.wisc.edu/condor

Some Remarkable Numbers

Application First Data Data Volume (TB/yr)

User Community

SDSS 1999 10 100s

LIGO 2002 250 100s

ATLAS/

CMS

2005 5,000 1000s

Characteristics of four physics experiments targeted by GriPhyN:

Source: GriPhyN Proposal, 2000

3www.cs.wisc.edu/condor

Even More Remarkable…

“ ..the data volume of CMS is expected to subsequently increase rapidly, so that the accumulated data volume will reach 1 Exabyte (1 million Terabytes) by around 2015.”

Source: PPDG Deliverables to CMS

4www.cs.wisc.edu/condor

Other Data Intensive Applications

Genomic information processing applicationsBiomedical Informatics Research Network (BIRN) applicationsCosmology applications (MADCAP)Methods for modeling large molecular systems Coupled climate modeling applicationsReal-time observatories, applications, and data-management (ROADNet)

5www.cs.wisc.edu/condor

Need to Deal with Data Placement

Data need to be moved, staged, replicated, cached, removed; storage space for data should be allocated, de-allocated.We call all of these data related activities in the Grid as Data Placement (DaP) activities.

6www.cs.wisc.edu/condor

State of the Art

Data placement activities in the Grid are performed either manually or by simple scripts.Data placement activities are simply regarded as “second class citizens” of the computation dominated Grid world.

7www.cs.wisc.edu/condor

Our Goal

Our goal is to make data placement activities “first class citizens” in the Grid just like the computational jobs!They need to be queued, scheduled, monitored and managed, and even checkpointed.

8www.cs.wisc.edu/condor

Outline

IntroductionGrid ChallengesStork SolutionsCase Study: SRB-UniTree Data PipelineConclusions & Future Work

9www.cs.wisc.edu/condor

Grid Challenges

Heterogeneous ResourcesLimited ResourcesNetwork/Server/Software FailuresDifferent Job RequirementsScheduling of Data & CPU together

10www.cs.wisc.edu/condor

Stork

Intelligently & reliably schedules, runs, monitors, and manages Data Placement (DaP) jobs in a heterogeneous Grid environment & ensures that they complete.What Condor means for computational jobs, Stork means the same for DaP jobs. Just submit a bunch of DaP jobs and then relax..

11www.cs.wisc.edu/condor

Stork Solutions to Grid Challenges

Specialized in Data ManagementModularity & ExtendibilityFailure RecoveryGlobal & Job Level PoliciesInteraction with Higher Level Planners/Schedulers

12www.cs.wisc.edu/condor

Already Supported URLs

file:/ -> Local Fileftp:// -> FTPgsiftp:// -> GridFTPnest:// -> NeST (chirp) protocolsrb:// -> SRB (Storage Resource Broker) srm:// -> SRM (Storage Resource Manager) unitree:// -> UniTree serverdiskrouter:// -> UW DiskRouter

13www.cs.wisc.edu/condor

Higher Level Planners

DAGMan

Condor-G(compute)

Stork(DaP)

RFT

GateKeeper

SRM

StartD

SRB NeST GridFTP

14www.cs.wisc.edu/condor

CondorJob

Queue

Interaction with DAGMan

Job A A.submitDaP X X.submitJob C C.submitParent A child C, XParent X child B…..

A

DAGMan

B

D

A

C StorkJob

Queue

X

Y

X

15www.cs.wisc.edu/condor

Sample Stork submit file

[ Type = “Transfer”; Src_Url =

“srb://ghidorac.sdsc.edu/kosart.condor/x.dat”; Dest_Url =

“nest://turkey.cs.wisc.edu/kosart/x.dat”;…………Max_Retry = 10;Restart_in = “2 hours”;

]

16www.cs.wisc.edu/condor

Case Study: SRB-UniTree Data Pipeline

We have transferred ~3 TB of DPOSS data (2611 x 1.1 GB files) from SRB to UniTree using 3 different pipeline configurations. The pipelines are built using Condor and Stork scheduling technologies. The whole process is managed by DAGMan.

SRB Server UniTree Server

NCSA Cache

SRB getUniTree put

Submit Site1

18www.cs.wisc.edu/condor

SRB Server UniTree Server

SDSC Cache NCSA Cache

SRB get

GridFTP

UniTree put

Submit Site

2

19www.cs.wisc.edu/condor

SRB Server UniTree Server

SDSC Cache NCSA Cache

SRB get

DiskRouter

UniTree put

Submit Site

3

20www.cs.wisc.edu/condor

Outcomes of the Study

1. Stork interacted easily and successfully with different underlying systems: SRB, UniTree, GridFTP and Diskrouter.

21www.cs.wisc.edu/condor

Outcomes of the Study (2)

2. We had the chance to compare different pipeline topologies and configurations:

Configuration End-to-end rate (MB/sec)

1 5.0

2 3.2

3 5.95

22www.cs.wisc.edu/condor

Outcomes of the Study (3)

3. Almost all possible network, server, and software failures were recovered automatically.

23www.cs.wisc.edu/condor

UniTree not responding Diskrouter reconfigured and restarted

SDSC cache reboot & UW CS Network outage SRB server maintenance

Failure Recovery

24www.cs.wisc.edu/condor

For more information on the results of this study, please check:

http://www.cs.wisc.edu/condor/stork/

25www.cs.wisc.edu/condor

Conclusions

Stork makes data placement a “first class citizen”.Stork is the Condor of data placement world.Stork is fault tolerant, easy to use, modular, extendible, and very flexible.

26www.cs.wisc.edu/condor

Future Work

More intelligent schedulingData level management instead of file level managementCheckpointing for transfersSecurity

27www.cs.wisc.edu/condor

You don’t have to FedEx your data

anymore.. Stork delivers it for

you!For more information Drop by my office anytime

• Room: 3361, Computer Science & Stats. Bldg.

Email to:• kosart@cs.wisc.edu

top related