stork: a scheduler for data placement activities in grid

STORK: A Scheduler for Data Placement

Activitiesin Grid

Tevfik KosarUniversity of Wisconsin-Madison

kosart@cs.wisc.edu

2www.cs.wisc.edu/condor

Some Remarkable Numbers

Application First Data Data Volume (TB/yr)

User Community

SDSS 1999 10 100s

LIGO 2002 250 100s

ATLAS/

2005 5,000 1000s

Characteristics of four physics experiments targeted by GriPhyN:

Source: GriPhyN Proposal, 2000

Even More Remarkable…

“ ..the data volume of CMS is expected to subsequently increase rapidly, so that the accumulated data volume will reach 1 Exabyte (1 million Terabytes) by around 2015.”

Source: PPDG Deliverables to CMS

Other Data Intensive Applications

Genomic information processing applicationsBiomedical Informatics Research Network (BIRN) applicationsCosmology applications (MADCAP)Methods for modeling large molecular systems Coupled climate modeling applicationsReal-time observatories, applications, and data-management (ROADNet)

Need to Deal with Data Placement

Data need to be moved, staged, replicated, cached, removed; storage space for data should be allocated, de-allocated.We call all of these data related activities in the Grid as Data Placement (DaP) activities.

State of the Art

Data placement activities in the Grid are performed either manually or by simple scripts.Data placement activities are simply regarded as “second class citizens” of the computation dominated Grid world.

Our Goal

Our goal is to make data placement activities “first class citizens” in the Grid just like the computational jobs!They need to be queued, scheduled, monitored and managed, and even checkpointed.

Outline

IntroductionGrid ChallengesStork SolutionsCase Study: SRB-UniTree Data PipelineConclusions & Future Work

Grid Challenges

Heterogeneous ResourcesLimited ResourcesNetwork/Server/Software FailuresDifferent Job RequirementsScheduling of Data & CPU together

Intelligently & reliably schedules, runs, monitors, and manages Data Placement (DaP) jobs in a heterogeneous Grid environment & ensures that they complete.What Condor means for computational jobs, Stork means the same for DaP jobs. Just submit a bunch of DaP jobs and then relax..

Stork Solutions to Grid Challenges

Specialized in Data ManagementModularity & ExtendibilityFailure RecoveryGlobal & Job Level PoliciesInteraction with Higher Level Planners/Schedulers

Already Supported URLs

file:/ -> Local Fileftp:// -> FTPgsiftp:// -> GridFTPnest:// -> NeST (chirp) protocolsrb:// -> SRB (Storage Resource Broker) srm:// -> SRM (Storage Resource Manager) unitree:// -> UniTree serverdiskrouter:// -> UW DiskRouter

Higher Level Planners

DAGMan

Condor-G(compute)

Stork(DaP)

GateKeeper

StartD

SRB NeST GridFTP

CondorJob

Interaction with DAGMan

Job A A.submitDaP X X.submitJob C C.submitParent A child C, XParent X child B…..

DAGMan

C StorkJob

Sample Stork submit file

[ Type = “Transfer”; Src_Url =

“srb://ghidorac.sdsc.edu/kosart.condor/x.dat”; Dest_Url =

“nest://turkey.cs.wisc.edu/kosart/x.dat”;…………Max_Retry = 10;Restart_in = “2 hours”;

Case Study: SRB-UniTree Data Pipeline

We have transferred ~3 TB of DPOSS data (2611 x 1.1 GB files) from SRB to UniTree using 3 different pipeline configurations. The pipelines are built using Condor and Stork scheduling technologies. The whole process is managed by DAGMan.

SRB Server UniTree Server

NCSA Cache

SRB getUniTree put

Submit Site1

SDSC Cache NCSA Cache

SRB get

GridFTP

UniTree put

Submit Site

SDSC Cache NCSA Cache

SRB get

DiskRouter

UniTree put

Submit Site

Outcomes of the Study

1. Stork interacted easily and successfully with different underlying systems: SRB, UniTree, GridFTP and Diskrouter.

Outcomes of the Study (2)

2. We had the chance to compare different pipeline topologies and configurations:

Configuration End-to-end rate (MB/sec)

3 5.95

Outcomes of the Study (3)

3. Almost all possible network, server, and software failures were recovered automatically.

UniTree not responding Diskrouter reconfigured and restarted

SDSC cache reboot & UW CS Network outage SRB server maintenance

Failure Recovery

For more information on the results of this study, please check:

http://www.cs.wisc.edu/condor/stork/

Conclusions

Stork makes data placement a “first class citizen”.Stork is the Condor of data placement world.Stork is fault tolerant, easy to use, modular, extendible, and very flexible.

Future Work

More intelligent schedulingData level management instead of file level managementCheckpointing for transfersSecurity

You don’t have to FedEx your data

anymore.. Stork delivers it for

you!For more information Drop by my office anytime

• Room: 3361, Computer Science & Stats. Bldg.

Email to:• kosart@cs.wisc.edu

stork: a scheduler for data placement activities in grid

data placementdata

data placement dap activities

data placement dap jobs

data related activities

srbunitree data pipelinewe

accumulated data volume

data volume of cms

tb of dposs data

Documents

110214 stork fokker

developing scheduler test cases to verify scheduler ... ›...

santa stork

foretagspresentation stork automation

maximo scheduler & scheduler plus overview

boletin stork

stork presentation - cs · stork is an eu co-funded project...

stork data scheduler: current status and future directions

maximo scheduler & scheduler plus

stork technical services

2005 prüfstände stork domali. stork n.v. ® 2 das...

marabou stork

application-level optimization of big data transfers ... ·...

ricardo yepes stork

stork amc iso55000

reliable and efficient grid data placement using stork...

developing scheduler test cases to verify scheduler...

cálculo ampa stork

stork: making data placement a first class citizen in the...

machinefabriek stork