dan bradley computer sciences department university of wisconsin-madison [email protected] schedd on...
TRANSCRIPT
Dan BradleyComputer Sciences Department
University of [email protected]
http://www.cs.wisc.edu/condor
Schedd On The Side
www.cs.wisc.edu/condor
Schedd
ScheddOn The
Side
What is it?Specialized scheduler operating on schedd’s jobs.
Job 1Job 2Job 3Job 4Job 5…Job 4*
job queue
www.cs.wisc.edu/condor
Condor Farm Story
Schedd
StartdResources
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeed
RandomSeed
RandomSeed
RandomSeed
RandomSeed
RandomSeed
RandomSeed
Application
condor_submit
job queue
•Now that this is working, howcan I use my collaborator’sresources too?
www.cs.wisc.edu/condor
Option #1: Merge Farms
› Combine machines with collaborator into one Condor resource pool.o Everything works just like it did before.o Excellent option for small to medium clusters.
o Requires bidirectional connectivity to all startds, or equivalent via GCB.
o Requires some administrative coordination (e.g. upgrades, negotiator policy, security, etc.)
www.cs.wisc.edu/condor
Option #2: Flocking Together
Schedd
LocalStartds
RemoteStartds
•full featured(std universe etc)•automatic matchmaking•easy to configure
•requires bidirectionalconnectivity•both sites must runcondor
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeed
RandomSeed
RandomSeed
RandomSeed
www.cs.wisc.edu/condor
Gatekeeper
X
Option #3: Grid Universe
Schedd
Startds
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed Random
SeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeed
RandomSeed
RandomSeed
RandomSeed
•easier to live with private networks•may use non-Condor resources
•restricted Condor feature set(e.g. no std universe over grid)•must pre-allocating jobsbetween vanilla and grid universe
vanilla site X
www.cs.wisc.edu/condor
Option #4: Routing Jobs
Schedd
LocalStartds
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeed Random
SeedRandomSeed
RandomSeed Random
SeedRandomSeed
RandomSeed Random
SeedRandomSeed
RandomSeed
RandomSeed
RandomSeed
RandomSeed
RandomSeed
ScheddOn The
Side Gatekeeper
X
Y
Z
vanilla site X
RandomSeed
RandomSeed
site Y site Z
•dynamic allocation of jobsbetween vanilla and grid universes.•not every job is appropriate fortransformation into a grid job.
www.cs.wisc.edu/condor
What About Flow Control?
› May restrict routing to jobs which have been rejected by negotiator.
› May limit maximum actively routed jobs on a per site basis.
› May limit maximum idle routed jobs per site.
› Periodic remove of idle routed jobs is possible, but no guarantee of optimal rescheduling.
› Routing table may be reconfigured dynamically.
› Multicast? Might be interesting to try.
www.cs.wisc.edu/condor
What About I/O?›Jobs must be sandboxable (i.e. specifying input/output via transfer-files mechanism).
›Routing of standard universe is not supported.
›Additional restrictions may apply, depending on site network and disk.
www.cs.wisc.edu/condor
What Types of Grids?›Routing table may contain any combination of grid types supported by the grid universe.
›Example: Condor-C
Schedd
ScheddOn The
Side
Schedd X
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeed
site X
•for two Condor sites, schedd-to-scheddsubmission requires no additional software•however, still not as trivial to use as flocking
www.cs.wisc.edu/condor
Routing Behind the Scenes
Gatekeeper
XSchedd
ScheddOn The
Side
Schedd X3
X2
•navigate internal firewalls•provide custom routesfor special users•improve scalability•However, keep in mindI/O requirements etc.
www.cs.wisc.edu/condor
Future Step: Glidein Factory
Gatekeeper
X
Schedd
Startds
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeedRandomSeed
RandomSeed
RandomSeed
RandomSeed
RandomSeed
•true late binding of jobs to resources•may run on top of non-Condor sites•supports full feature set of Condor(e.g. standard universe)
•requires GCB on network boundary(initiated by schedd-on-the-side?)
homesite X
ScheddOn The
Side
glidein jobs
www.cs.wisc.edu/condor
Glideing in the Works
Schedd
ScheddOn The
Side
glidein factory
site X
schedd-to-schedd
schedd-to-gatekeeper
•hierarchical strategy for scalabilityand reliability•better match for private networks
•may require some additional horsepowerfrom gatekeeper machine, perhaps adedicated element for “edge services”.
RandomSeed
RandomSeed
RandomSeed
RandomSeed
RandomSeed
www.cs.wisc.edu/condor
Thanks
Interested?Let us know.
We are currentlyusing job routingfor specific usersat UW. Dan Bradley
Future developmentwill focus on moreuse-cases.