stork: a scheduler for data placement activities in grid
Post on 23-Jan-2016
19 Views
Preview:
DESCRIPTION
TRANSCRIPT
STORK: A Scheduler for Data Placement
Activitiesin Grid
Tevfik KosarUniversity of Wisconsin-Madison
kosart@cs.wisc.edu
2www.cs.wisc.edu/condor
Some Remarkable Numbers
Application First Data Data Volume (TB/yr)
User Community
SDSS 1999 10 100s
LIGO 2002 250 100s
ATLAS/
CMS
2005 5,000 1000s
Characteristics of four physics experiments targeted by GriPhyN:
Source: GriPhyN Proposal, 2000
3www.cs.wisc.edu/condor
Even More Remarkable…
“ ..the data volume of CMS is expected to subsequently increase rapidly, so that the accumulated data volume will reach 1 Exabyte (1 million Terabytes) by around 2015.”
Source: PPDG Deliverables to CMS
4www.cs.wisc.edu/condor
Other Data Intensive Applications
Genomic information processing applicationsBiomedical Informatics Research Network (BIRN) applicationsCosmology applications (MADCAP)Methods for modeling large molecular systems Coupled climate modeling applicationsReal-time observatories, applications, and data-management (ROADNet)
5www.cs.wisc.edu/condor
Need to Deal with Data Placement
Data need to be moved, staged, replicated, cached, removed; storage space for data should be allocated, de-allocated.We call all of these data related activities in the Grid as Data Placement (DaP) activities.
6www.cs.wisc.edu/condor
State of the Art
Data placement activities in the Grid are performed either manually or by simple scripts.Data placement activities are simply regarded as “second class citizens” of the computation dominated Grid world.
7www.cs.wisc.edu/condor
Our Goal
Our goal is to make data placement activities “first class citizens” in the Grid just like the computational jobs!They need to be queued, scheduled, monitored and managed, and even checkpointed.
8www.cs.wisc.edu/condor
Outline
IntroductionGrid ChallengesStork SolutionsCase Study: SRB-UniTree Data PipelineConclusions & Future Work
9www.cs.wisc.edu/condor
Grid Challenges
Heterogeneous ResourcesLimited ResourcesNetwork/Server/Software FailuresDifferent Job RequirementsScheduling of Data & CPU together
10www.cs.wisc.edu/condor
Stork
Intelligently & reliably schedules, runs, monitors, and manages Data Placement (DaP) jobs in a heterogeneous Grid environment & ensures that they complete.What Condor means for computational jobs, Stork means the same for DaP jobs. Just submit a bunch of DaP jobs and then relax..
11www.cs.wisc.edu/condor
Stork Solutions to Grid Challenges
Specialized in Data ManagementModularity & ExtendibilityFailure RecoveryGlobal & Job Level PoliciesInteraction with Higher Level Planners/Schedulers
12www.cs.wisc.edu/condor
Already Supported URLs
file:/ -> Local Fileftp:// -> FTPgsiftp:// -> GridFTPnest:// -> NeST (chirp) protocolsrb:// -> SRB (Storage Resource Broker) srm:// -> SRM (Storage Resource Manager) unitree:// -> UniTree serverdiskrouter:// -> UW DiskRouter
13www.cs.wisc.edu/condor
Higher Level Planners
DAGMan
Condor-G(compute)
Stork(DaP)
RFT
GateKeeper
SRM
StartD
SRB NeST GridFTP
14www.cs.wisc.edu/condor
CondorJob
Queue
Interaction with DAGMan
Job A A.submitDaP X X.submitJob C C.submitParent A child C, XParent X child B…..
A
DAGMan
B
D
A
C StorkJob
Queue
X
Y
X
15www.cs.wisc.edu/condor
Sample Stork submit file
[ Type = “Transfer”; Src_Url =
“srb://ghidorac.sdsc.edu/kosart.condor/x.dat”; Dest_Url =
“nest://turkey.cs.wisc.edu/kosart/x.dat”;…………Max_Retry = 10;Restart_in = “2 hours”;
]
16www.cs.wisc.edu/condor
Case Study: SRB-UniTree Data Pipeline
We have transferred ~3 TB of DPOSS data (2611 x 1.1 GB files) from SRB to UniTree using 3 different pipeline configurations. The pipelines are built using Condor and Stork scheduling technologies. The whole process is managed by DAGMan.
SRB Server UniTree Server
NCSA Cache
SRB getUniTree put
Submit Site1
18www.cs.wisc.edu/condor
SRB Server UniTree Server
SDSC Cache NCSA Cache
SRB get
GridFTP
UniTree put
Submit Site
2
19www.cs.wisc.edu/condor
SRB Server UniTree Server
SDSC Cache NCSA Cache
SRB get
DiskRouter
UniTree put
Submit Site
3
20www.cs.wisc.edu/condor
Outcomes of the Study
1. Stork interacted easily and successfully with different underlying systems: SRB, UniTree, GridFTP and Diskrouter.
21www.cs.wisc.edu/condor
Outcomes of the Study (2)
2. We had the chance to compare different pipeline topologies and configurations:
Configuration End-to-end rate (MB/sec)
1 5.0
2 3.2
3 5.95
22www.cs.wisc.edu/condor
Outcomes of the Study (3)
3. Almost all possible network, server, and software failures were recovered automatically.
23www.cs.wisc.edu/condor
UniTree not responding Diskrouter reconfigured and restarted
SDSC cache reboot & UW CS Network outage SRB server maintenance
Failure Recovery
24www.cs.wisc.edu/condor
For more information on the results of this study, please check:
http://www.cs.wisc.edu/condor/stork/
25www.cs.wisc.edu/condor
Conclusions
Stork makes data placement a “first class citizen”.Stork is the Condor of data placement world.Stork is fault tolerant, easy to use, modular, extendible, and very flexible.
26www.cs.wisc.edu/condor
Future Work
More intelligent schedulingData level management instead of file level managementCheckpointing for transfersSecurity
27www.cs.wisc.edu/condor
You don’t have to FedEx your data
anymore.. Stork delivers it for
you!For more information Drop by my office anytime
• Room: 3361, Computer Science & Stats. Bldg.
Email to:• kosart@cs.wisc.edu
top related